Drew McGowen
Drew McGowen

Reputation: 11706

32-bit assignments for 16-bit port addresses

As I was digging through the original Xbox kernel's code, I noticed that sometimes when it sets up the registers for port I/O, it assigns a 32-bit value to edx, even though the in and out instructions only use the low 16 bits of edx for the port address. As an example:

mov     edx, 0FFFF8004h
in      ax, dx
or      ax, 1
out     dx, ax
add     edx, 1Eh
in      ax, dx
or      ax, 2
out     dx, ax
mov     edx, 0FFFF8002h
...

Elsewhere (such as SMBus read and write), it's inconsistent; sometimes it assigns 16-bit values to dx, other times 32-bit values to edx.

If the upper 16 bits are never used, what's the point of specifying non-zero bits for them?

Upvotes: 2

Views: 335

Answers (1)

Ross Ridge
Ross Ridge

Reputation: 39581

My guess is that's done as micro-optimization to avoid a non-existent hazard and/or insignificant performance penalty.

For example, the programmer may have originally wrote something like:

66| BA 8004     mov     dx, 8004h
66| ED          in      ax, dx
66| 83 C8 01    or      ax, 1
66| EF          out     dx, ax
66| 83 C2 1E    add     dx, 1Eh

He then decided to replace add dx with add edx in order to save a byte and eliminate the performance penalty for decoding the operand size prefix:

66| BA 8004     mov     dx, 8004h
66| ED          in      ax, dx
66| 83 C8 01    or      ax, 1
66| EF          out     dx, ax
83 C2 1E        add     edx, 1Eh

Then he reads this in a contemporary Intel optimization manual:

Because Pentium II and Pentium III processors can execute code out of order, the instructions need not be immediately adjacent for the stall to occur. Example 2-7 also contains a partial stall.

Example 2-7 Partial Register Stall with Pentium II and Pentium III Processors

MOV AL, 8
MOV EDX, 0x40
MOV EDI, new_value
ADD EDX, EAX        ; Partial stall accessing EAX

His own code now looks similar so he avoids the partial register stall by replacing the 16-bit MOV instruction with the 32-bit one you see in your example. (In reality I don't think ADD instruction will ever stall, the IN and OUT instructions should give the MOV instruction more than enough time to retire.)

And yes, these micro-optimizations would be pointless. Even if they do save a CPU cycle or two, the performance gain would be insignificant compared to time it takes to execute the I/O instructions. But it wouldn't be at all surprising to see a Microsoft employee doing this. I've seen dumber things than this in Microsoft code, and during the 90's at least they seemed pretty obsessed with micro-optimizations.

The inconsistency you see is also not surprising. Microsoft would have had a number of different programmers working on the Xbox kernel, and could have easily included code from Windows or other projects.

Upvotes: 2

Related Questions