Read the first block of HDD

In this post, I write about my first experiment with the INT 13H Extension to read the Boot Record of the HDD. I am sure that I can use the INT 13H extensions for LBA since I have verified it in the previous post. To read with the INT 13H extensions the syntax is the following one:

  • AH = 0x42 (tells INT 13H that we want to perform LBA read)
  • DL = 0x80 (tells INT 13H that we want to read from the 1st HDD)
  • SI = pointer to LBA data structure
LBA disk address packet
Fig. A - LBA disk address packet

The LBA disk address packet structure looks like the one in Fig. A. There are two versions of it: the first is 16 bytes long (signalled with 0x10 at offset 0); the second is 24 bytes long (signalled with 0x18 at offset 0). We are likely to use the first version so the first byte is 0x10. The second byte at offset 0x01 is a reserved byte for which I didn't find any description so I set this to 0x00. The word at offset 0x02 tells the BIOS how many sectors we want to transfer with a single INT 13H call (the BIOS manufactured by Phonix in the version EED accepts 0x007F as a maximum value). In our case, just one will be enough so 0x 00 01 is the value to use (remember the little-endian convention). The double word at offset 0x04 is the memory location in RAM where to copy the HDD data to. This is in the format Segment:Offset and little-endian so you should read the bytes in the following way: Low-Offset, High-Offset, Low-Segment, High-Segment (this is what Lo, Ho, Ls and Hs stands for). Finally, the quadword at offset 0x08 is a 64bit number telling which is the first LBA block of the disk to start coping information from. The other 8 bytes are optional and we are not going to use them since we decided to use the version with 0x10 (16 in decimal) bytes length. And this is how much it concerns the syntax of using LBA INT 13H.
Here is my idea of the test. I need to read a block from the disk whose content is known for me (in such a way I can check what INT 13H is reading) and transfer it to a RAM region of my choice. The read with INT 13H will be successful if I find the known and expected content in the RAM. Even if I am studying the FAT16 and I am getting some ideas about how data is organized on the disk, I don't want to anticipate concepts here, so the only block I feel confident about the information it has and about its location on the disk is the Boot Record of the "TEST" partition. GRUB2 brings the Boot Record in RAM starting at 0x 07C 00 and I will read this again starting at absolute address 0x07E 00 in RAM. Then I will search for some specific data that I know it exists on my Boot Record. The software that performs this check will just dump two paragraphs of memory on the screen so that I can read if the result is the one I expected or not. Once again I abandon for the moment the step by step sequence of screenshots with DEBUG.EXE (we both know how to use it) and I comment directly the final result.

LBA-Read in binary
Fig. B - LBA-Read in binary

In Fig. B you see how the Boot Record looks like once loaded in memory by GRUB2. The initial sequence of green / green-pink / pink /green bytes is the same as in Fig. F of post "Learn read before you write": we jump over the BPB and we perform a general reset. The orange sequence performs the disk read after which it may be possible that there is an error. The light violet sequence of bytes immediately following is a conditional jump that branches the code whether or not a read error occurred. If we had an error, there will be no jump and the CPU continues execution streaming inside the light orange code whose task is to show an error message on the screen, otherwise, the code jumps at 0x07C 90 and dumps two paragraphs on the screen. In the end, both branches join together in the yellow code which is the usual task of showing the "Press any key..." message, wait for keypress and reboot. At address 0x07D 80, I placed the function nibble_to_ASCII. At address 0x07D 90 is the well-known show_string function. At address 0x07D A0 we find the exit message string. At address 0x07D C0 we find the error message string. The "xx" part of the string will be filled at run time by the code with the value of the error returned in AL by the INT 13H, if any occurs. This is important because if an error occurs I would like to know which one was it. At address 0x07D D0 we find the LBA disk address packet that needs to be pointed using SI before calling INT 13H. This address packet is incomplete in the way it is written here, but the code will complete it at run-time. From 0x07D D0 to 0x07D D3 is the same as explained in Fig. A. From 0x07D D4 to 0x07D D7 I have already hard coded the address 0x0000:7E00 where I want to transfer the block of the disk. From 0x07D D8 to 0x07D DF we have the 8 bytes LBA block number of the HDD location where we want to start our reading from. Given the fact that it is little-endian, this corresponds to 0x00 00 00 00 90 90 90 90. The first four high bytes are a sequence of 0x00 hardcoded. This is because I know there will be not such a high value for the LBA number that requires the use of the four high bytes. How do I know it for sure? Well, I observed in Fig. C of post "Learn read before you write" that every entry in the partition table uses four bytes to describe the LBA starting point of the partition on the disk. So with four bytes of LBA, it must be possible to reach every block on my HDD. The remaining four low bytes are coded with an initial default value of my choice 0x90 90 90 90 (you can have here any value you like) which is irrelevant since it will be adjusted at run-time with the real value required for reading. I could have immediately coded this value here, but I preferred to do it at run-time just for my training. In other terms I wanted my code to read the information about the HDD organization at run-time from the BIOS Parameter Block (BPB). From 0x07D E0 till the end, I put my sequence of bytes including the boot signature that I want to use as a kind of signature for the block. The code will search this signature at the new memory location in RAM if the LBA reading succeeds.

Fig. C and Fig. D show the organization of the assembly code. Of course, this code is implemented with DEBUG.EXE but it is very hard for me to make comments using the debug screenshots so I typed it in an excel file in such a way I can generate more readable pictures.

LBA-Read in assembly (part 1)
Fig. C - LBA-Read in assembly (part 1)

In Fig. C you see the beginning part of the code which consists of the first long jump and the general reset (identical to Fig. G of post "Learn read before you write", so I avoided repeating it here). Moving forward in the code we look at the block responsible for the LBA disk read. You can observe that I took the information about the beginning of the "TEST" partition "D:\" from the BIOS Parameter Block (BPB). Since we are interested in the Boot Record, we know that this is the block at the beginning of the partition "D:\" so I can immediately use this value without manipulating it (no offset is required) and write it down in the LBA disk packet in place of the sequence of bytes: 0x 90 90 90 90. Address 0x07C 5D is the place where the code branches. If there is no reading error it jumps ahead, where it finds the routine for dumping two paragraphs of RAM on the screen and then jump back to the exit sequence at 0x07C 78. The "show error block" converts the hexadecimal error code in AH into ASCII and saves the result in the "XX" part of the error message string. After that, it prints the string on the screen by calling the show_string function. The "exit sequence" is the same as usual.

LBA-Read in assembly (part 2)
Fig. D - LBA-Read in assembly (part 2)

In Fig. D, you see the "dump 2 paragraphs" sequence (starting at address 0x07C 90) which is a little bit more interesting. In this code, I will use the INT10 a lot of times so I don't want to keep resetting AH and BX every time I use it. Instead, I just pre-set the values of these registers at the beginning taking care that they remain unchanged for the rest of this routine. The logic of this block is based on two nested FOR-loops. The outer cycle accounts for the action to be repeated by each paragraph (only 2 paragraphs in this case); the inner cycle accounts for the action to be repeated by every byte of the paragraph (16 bytes by each paragraph). Since I planned in detail what to do, I don't use the general scheme of nesting FOR-loops (as shown in code xx) but I use the CPU internal registers instead of the stack. Every time I convert a byte from hexadecimal to its corresponding ASCII code, I have to start from the high nibble since it will be printed on the screen scrolling from left to right. However, the function nibble_to_ASCII converts the low nibble hexadecimal value in ASCII code so I have to shift the high nibble of AL down to the low nibble side. I cannot (or I don’t want) to use SHR AL, CL with CL=4 because I have almost all of my CPU registers busy and I should move data around and back again, consuming more bytes of code than the eight bytes required to code four times SHR AL, 1. So, I may agree with you if you consider this coding technique as not elegant, nevertheless it economizes the size of the code. In the innermost nested FOR-loop, I chain an empty space on screen after the dump of each byte and on the outer FOR-loop I chain a sequence of "line feed" and "carriage return" to separate the paragraphs on different lines.

LBA-Read runs in real mode
Fig. E - LBA-Read runs in real mode

In Fig. E you can see the result on the screen so we finally know how to read.

Comments