Relocatable code
When I think about relocatable things, the image of a container comes to my
mind. To me, a container is the good definition of relocatable thing: it works
on ships, trains and trucks in exactly the same way.
In this post, I talk about the challenges I had to achieve truly relocatable
code and data in memory. If you wonder why achieving truly relocatable code is
so important, then I will tell you that this is the backbone of any computer
system. The operative system (OS) is the general manager of all resources
(hardware and software) and the applications are the tools provided to the
users for many kinds of tasks. One never knows how much RAM is installed on a
computer, or how many RAM is free at any given time. The operative system
manages and solves this kind of problem hence the operative system must be
free to load and start the application at any RAM address. Additionally,
during the development of any application, one uses libraries and already
developed procedures, then it is important for all these procedures to work at
any position in the address space of the final application. In this last case,
it is about relocatable procedures that can be reused and repackaged based on
the needs of the main program under development. My focus is on this very last
case.
I want to talk here about the very last details that I fixed in order to have
a truly relocatable code. The best way for me to explain it is to use an
example and the best way to learn is to start with a bad example (mistakes
have much more to teach).
The problem
In "HAND_LNK_DBG (the bad one)" I show an example about how I used to code a procedure that was intended to be a full program by its own in contrast to a service procedure. I bring here the protocol of the compilation job done with DEBUG.EXE1 because this lets me use the addresses as if those were numbered lines that I can mention when I want to refer to them.
Notice that at address SEG:7C00 I did write code this time because this was the main procedure. All service procedures were comprised (statically linked in the final code) and the final binary was created starting from 0x7C00 until the end. According to all what I said in the post "Hand made linking", this should be a fully relocatable code but well it wasn't due to the small detail that I am going to explain now. Before it, let me show the test I did with DEBUG.EXE.
Fig. A - Debugging HAND_LNK (the bad one) |
In Fig. A, you see that the code worked when loaded at the same memory position where it was developed. The code HAND_LNK is in red and contains inside itself the string (in yellow) and all other service procedures (in green). You can observe how the yellow string changed in memory before and after code execution (white boxes). Then I loaded the HAND_LNK.BIN file at another location in RAM (SEG:0300) and I tried to run it, but it failed. Do you have any idea why was it so?
Fig. B - Debugging HAND_LNK (the bad one) |
Well, you can find the problem in "HAND_LNK_DBG (the bad one)" at addresses SEG:7C80 and SEG:7C86. Did you spot it? Yes, once again it was the same problem as in Fig. F of the post "AP-PROBE": the pointing of strings needs absolute addressing mode and this made the code not fully relocatable. No matter where the code was loaded in RAM, all jumps and calls kept working but strings were still expected to be found at the same fixed absolute place.
Fig. C - Debugging HAND_LNK (the bad one) |
In Fig. C, you see that I went looking at the absolute memory address where the original string was placed and I saw that the code had modified the RAM content indeed.
So is there really nothing that one can do to realize a truly relocatable software both in its code and data component? Are strings really non-relocatable? Indeed the code expects to find the string always at fixed offset inside a segment which is undefined and, as such still relocatable.
Let me explain it, be patient. Intel 8086 implements relocatable software in different ways. There is a strategy in place to realize relocatable code, and a different one to implement relocatable data. The strategy for the relocatable code is the use of relative addresses for jumps, loops and calls so that the total code can be moved anywhere and still work. There still exist instructions to perform absolute jumps and calls if required by the programmer and, I like to group things in two categories: use of absolute or relative addresses.
- A - Absolute address far (this is no more relocatable per se2).
- R1 - Relative address short (this is relocatable).
- R2 - Relative address near (this is still relocatable).
The first addressing mode is what I called "A" (for Absolute) and it let us change the IP and CS all together at the same time.
The last two, R1 and R2, are identical except for the fact that, in case of
R1, the relative address of the jump is coded using one byte, which is
interpreted as a signed integer so it can cover a span from -128 to +127 bytes
from the current location of the instruction pointer (IP), and with two bytes
in case of R2. In both cases, the code segment remains unchanged and the code
jumps within the same exact segment.
For the relative addressing mode of code, it is very important to remember
that the IP register can flip around during a relative jump, call or loop. I
think of a segment as a circle rather than a line of addresses, where the
first and the last address (SEG:0000 and SEG:FFFF) are glued just one after
the other as I am going to demonstrate now with a short experiment.
Fig. D - IP flips around with a backwards call |
In
Fig. D, I placed at address SEG:0600 a backward call, then I copied it (with the
command m) to address SEG:0100.
0x E8 FD FA means push IP on the stack (0x E8 =
CALL) and subtract 1283 bytes from IP
(0x FA FD = -1283 in decimal). If executed from address SEG:0100,
the backwards call would just go before the address SEG:0000, which means
restart the segment from SEG:FFFF and keep going backwards from thereon.
When I executed just this very one instruction I saw that no exception was
thrown and the CPU executed it just fine, with a flip around the 16bit
boundary of the register IP. Finally and in the most astonishing way, if you
create a truly relocatable code and put it halfway across the 16bit boundary
of a segment, this is still working fine due to the fact that IP can flip back
and forward across the 16bit boundary without any problem (I show this at the
end of the post).
You should stop now for a moment and think about this just to realize that the
CPU doesn't just "think" the way we "think". The CPU is much
closer to a toothed wheel than you may imagine. If you turn the wheel right,
all other wheels turn accordingly. No thinking in between. Just mechanics as
if it was made of nothing else than toothed wheels.
In conclusion, the code part of a procedure is not just relocatable anywhere
in RAM but it is relocatable anywhere within the CS segment and
this is a huge difference (as we are going to see immediately).
The strategy used to implement relocatable data part of a procedure is the
SEG:OFF addressing mode. In fact, the offset is a relative address from the
start of the segment. In the intention of the 8086 architecture, one should
use the segment DS and ES for data (but thanks to segment prefix overwrite
instructions, one can use also CS and SS for data if needed). When you access
data such as strings it occurs relative to the DS segment. In practice, a
programmer should implement strict separation of code (all in CS) and data
(all in DS and or ES). In this way, the segments can be placed anywhere in RAM
and the code will work perfectly because the programmer uses the offset part
for the address meanwhile the loader is free to decide where in RAM to
allocate all segments.
In conclusion,
the data part of a procedure is not relocatable anywhere within the
segment, but just relocatable somewhere in RAM3
thanks to the reallocation of the data segments (DS, ES).
So far so good with the description of the problem, but the solution to it
will not be so immediate to implement. I can think about a working solution to
this problem but I am not fully convinced from a general perspective.
A general solution should enforce the spirit of segmentation of the 8086 CPU.
In other terms, I should develop a loader that:
- reads a file,
- can distinguish between bytes of code and bytes of data,
- finds the spots in RAM to place the different segments and finally
- loads each portion in the belonging segments in RAM.
At the same time, I should develop a program to create such a
"loadable" file starting from a text file containing just mnemonics and
symbolic labels for jumps and calls.
At a second glance, I would realize that I cannot do the complete conversion
from the text file to the loadable file (let us call it executable file) just
with one program within one stage, but I need two different stages and two
different programs for each stage in order to create an executable file
starting from a text file containing just mnemonics and symbolic labels. But
why this has to be so complicated?
Suppose that I have two different procedures that together perform the total
task of the final program I want to realize. For each procedure I can
distinguish among code part and data part, so let me call the procedures "PA"
and "PB" then I end having four logical pieces: "PA_CODE", "PA_DATA",
"PB_CODE" and "PB_DATA". When I write the procedures "PA" and "PB" I don't
separate logically the two components (DATA and CODE) because those components
belong together but at the end, I need a process that can find, takes and
repackages "PA_CODE" plus "PB_CODE" all in the code segment part of the
executable file, and "PA_DATA" plus "PB_DATA" all in the data segment part of
the same executable file. The best way to achieve it is to have two stages.
In the very first stage, a software (let me call it assembler) takes a
text file as input and not only it converts mnemonics in opcodes, but also it
performs a separation between the CODE-Part from the
DATA-part and saves them in a clear way into an intermediate file (let
me call it object-file) together with all information concerning jumps
and references pointed by the CODE-Part into the
DATA-Part within the very same procedure (internal reference or symbol)
and outside it in case the procedure PA calls the procedure PB (external
reference or symbol).
In the second stage, another software (let me call it linker) takes the
intermediate files (object-files) and creates a bigger package with
them. It takes and groups all the CODE-Parts together and separates
them from all the DATA-parts that, once again, are grouped together.
Once all groups are re-packaged in this way, it fixes all references (or
symbols) and produces the final file that can be used by the loader program
(let me call this final file "executable" file or
"loadable" file). With all this job done, I have finally a code that is
truly relocatable also for the DATA-part of the software which now is
totally grouped within a single segment so that the SEG:OFF strategy of the
CPU can really work.
You may think while reading, as well as I thought when I was writing, that I
described the reason why Assembler-Linker-Loader works together in the very
way they currently work, and that the portable executable file format (.EXE)
is the general valid solution for it.
Personally, I am more than happy to be able to see and understand the real
reason why Assembler, Loader, Linker, .EXE-File and Operative Systems they all
work together in this way, but honestly, I cannot solve the problem in the
right and elegant way as they do. I have to think of a workaround that works
at my skill level.
My solution
When I write code with DEBUG.EXE, I see addresses that increase as I scroll down the lines of code with the lower addresses towards the top of the file and the higher addresses towards the bottom of the file. In Fig. E I pointed the arrow going from top to bottom in order to reflect the same reading experience when I write code in DEBUG.EXE4.
Fig. E - adresses in DEBUG and REAL addresses |
When I write code in DEBUG.EXE I know which is the address of the string relative to the axis used by DEBUG.EXE. I call it SIDebug because DS:SI is the suggested way of pointing a source string and this is relative to DEBUG.EXE. When the code runs it has a different axis of addresses in memory. It has then the REAL axis of addresses. I call the value that SI assumes when the program runs SIReal. In the same way, I can mark a different position in the procedure, this time however, I mark the point in the code using the register IP. I call IPDebug the value of the instruction pointer register that marks the code while I am writing within DEBUG.EXE and IPReal the value of IP register that marks the code in the REAL address axis when the program runs. SIDebug and IPDebug are known to me at the moment I write code. I can get IPReal while the code runs so I can calculate SIReal at runtime according to the equation in Fig. F since the distance or offset between IPReal and SIReal is the same as the offset between IPDebug and SIDebug.
Fig. F - calculation of the correction factor |
With this new technique, I can write code and read the addresses directly as they are in DEBUG.EXE, calculate the correction factor at runtime and apply it when necessary. You can see here the new version of "HAND_LNK_DBG (the good one)" meanwhile you can find the file "hand_lnk.npp" in the DOWNLOAD AREA.
At address line SEG:7C60, I marked the code and I created my IPReal. The mnemonic CALL 7C63 (which I renamed
into "call myself:") produces the opcode 0x E8 00 00
which pushes the IP on the stack and continues normal execution with the
following instruction at line address SEG:7C63 (MOV BP, SP) because the IP increment is equal to zero in this case
(0x 00 00). At this point, I had IPReal
on the stack at address SS:[BP + 0x00]. I calculated immediatelly
the conversion factor just by subtracting the IPDebug
from it.
At address lines SEG:7C8D and SEG:7C96, I converted at runtime the SI and DI
pointers from Debug to Real just with the simple addition of the
correction factor stored at SS:[BP + 0x00]. In this way, I could
achieve a truly relocatable code as you can see in
Fig. G.
Fig. G - Hand linking is truly relocatable code |
For the last test, I just loaded the code at two different memory location and run it to verify it worked correctly. The first time, I loaded it at SEG:023A and then I triggered the test with a direct call and use of the command p (lines marked with the blue colour). The second time, I loaded it at SEG:1BC7 and I repeated the test (lines marked with the yellow colour).
Some few lines above I was writing about the fact that the IP register flips around the 16bit boundary with ease and that I see a segment as a circle rather than a line of addresses, where the first and the last address (SEG:0000 and SEG:FFFF) are glued just one after the other. So why not test the relocatable code just placing the procedure halfway across the boundary of the code segment? I thought that this was just a great idea, so I did it (Fig. H) and look it worked as expected!
Fig. H - Hand linking is halfway across the boundary of CS |
In Fig. H, I took care to take the Stack Pointer away since CS and SS overlap in DEBUG.EXE (lines marked with the purple colour). After that, I used the command m to copy the first half of "Hand Linking" at the end of the segment (lines marked with the green colour) and the last half of it at the beginning of the segment (lines marked with the blue colour). Finally, I prepared the trigger CALL FF90 and I did my test with the command p (yellow lines). In my opinion, this last test shows that the CPU is very much mechanical in its behaviour.
Once again, I wrote quite a long post, and if you made to read it through till the end, then you must be as passionate as I am, about understanding how the PC works.
Comments
Post a Comment