Skip to content

FPU, MMX, XMM and BBQ

August 7, 2011

So you wrote a debugger with the Windows API, everything is working fine and dandy but then you want to access the x87 FPU registers. If you’re totally insane you’ll even ask me how to access the MMX and SSE register.

Now, you wouldn’t be reading this if I didn’t tell you in the next few paragraphs, so read on.

I will be assuming you know how to use GetContextThread and have a rough idea what is or should be in a CONTEXT struct.

FPU

x32

To get the FPU registers delivered with your CONTEXT struct, the context.ContextFlags has to contain CONTEXT_FLOATING_POINT. After calling GetThreadContext, you can access context.FloatSave which is a struct of type FLOATING_SAVE_AREA and is defined as such:

typedef struct _FLOATING_SAVE_AREA {
    DWORD   ControlWord;
    DWORD   StatusWord;
    DWORD   TagWord;
    DWORD   ErrorOffset;
    DWORD   ErrorSelector;
    DWORD   DataOffset;
    DWORD   DataSelector;
    BYTE    RegisterArea[80];
    DWORD   Cr0NpxState;
} FLOATING_SAVE_AREA;

The field we are interested in is RegisterArea, a buffer of 80 bytes – 10 bytes for each of the 8 FPU registers. How do you access ST(n) in a byte buffer? Simple: &context.FloatSave.RegisterArea[n*10] is the start of the 10-byte buffer for that register.

x64

If you’re already dancing and celebrating because it will all compile nicely on both x86 and x64 I have to disappoint you. Microsoft decided to slightly change the layout of the CONTEXT struct for the x86-64 architecture. Instead of FloatSave, you have to access FltSave (what genius decided to drop the ‘oa’?), a structure of type XMM_SAVE_AREA32:

typedef struct _XMM_SAVE_AREA32 {
    WORD   ControlWord;
    WORD   StatusWord;
    BYTE  TagWord;
    BYTE  Reserved1;
    WORD   ErrorOpcode;
    DWORD ErrorOffset;
    WORD   ErrorSelector;
    WORD   Reserved2;
    DWORD DataOffset;
    WORD   DataSelector;
    WORD   Reserved3;
    DWORD MxCsr;
    DWORD MxCsr_Mask;
    M128A FloatRegisters[8];
    M128A XmmRegisters[16];
    BYTE  Reserved4[96];
} XMM_SAVE_AREA32, *PXMM_SAVE_AREA32;

Looks similar, doesn’t it? This time we have to access the FloatRegisters member, an array of 8 registers. Each value is 128 bit wide but really only the lowest 80 bits are valid and represent the state of the corresponding FPU register.

80 bits != 64 bits

If your compiler supports 80 bit floating point values (long double in GCC), you’re pretty much done here – just load the value from the buffer and manipulate away! However, if your compiler decided not to implement or drop support for 80-bit float values (like VC++), you’ll run into a small problem. How do you load an 80-bit float without a native data type to represent it?

You could use inline assembler, sure. Something as simple as this should do (beware, pseudo code):

FLD TBYTE [&context.FloatSave.RegisterArea[n*10]] // context.FltSave.FloatRegisters on x64
FSTP QWORD [&myFloat64]

Sadly, this won’t work with VC++ since there is no inline assembler anymore when compiling for 64-bits. So we have to go for something more general, namely a function that will convert a buffer of 10 bytes into a double. The IEEE floating-point standard is pretty straight forward, and after googling for usable source code, a nice few minutes with the Intel manual, some time spent coding and testing and a lot of coke (the drink!) we end up with something like this:

#include <limits>
#include <cmath>
double readFloat80(const uint8_t buffer[10])
{
    //80 bit floating point value according to IEEE-754:
    //1 bit sign, 15 bit exponent, 64 bit mantissa

    const uint16_t SIGNBIT    = 1 << 15;
    const uint16_t EXP_BIAS   = (1 << 14) - 1; // 2^(n-1) - 1 = 16383
    const uint16_t SPECIALEXP = (1 << 15) - 1; // all bits set
    const uint64_t HIGHBIT    = (uint64_t)1 << 63;
    const uint64_t QUIETBIT   = (uint64_t)1 << 62;

    // Extract sign, exponent and mantissa
    uint16_t exponent = *((uint16_t*)&buffer[8]);
    uint64_t mantissa = *((uint64_t*)&buffer[0]);

    double sign = (exponent & SIGNBIT) ? -1.0 : 1.0;
    exponent   &= ~SIGNBIT;

    // Check for undefined values
    if((!exponent && (mantissa & HIGHBIT)) || (exponent && !(mantissa & HIGHBIT))) {
        return std::numeric_limits<double>::quiet_NaN();
    }

    // Check for special values (infinity, NaN)
    if(exponent == 0) {
        if(mantissa == 0) {
            return sign * 0.0;
        } else {
            // denormalized
        }
    } else if(exponent == SPECIALEXP) {
        if(!(mantissa & ~HIGHBIT)) {
            return sign * std::numeric_limits<double>::infinity();
        } else {
            if(mantissa & QUIETBIT) {
                return std::numeric_limits<double>::quiet_NaN();
            } else {
                return std::numeric_limits<double>::signaling_NaN();
            }
        }
    }

    //value = (-1)^s * (m / 2^63) * 2^(e - 16383)
    double significand = ((double)mantissa / ((uint64_t)1 << 63));
    return sign * ldexp(significand, exponent - EXP_BIAS);
}

You will of course lose some precision (we dropped 2 bytes!) but so will you do when interfacing the FPU via ASM. Please note that the code only works on little-endian systems and expects the 80-bit float in the buffer to be stored in little-endian, too.

I didn’t bother with performing the conversion the other way around, but if you understood what the code above does, you won’t have trouble implementing that yourself. It’s basically just a matter of shifting around bits :)

MMX

MMX is very simple once you realize that the MMX registers are just aliases to the low 64-bits (the mantissa) of the FPU registers, e.g. MM0 <-> ST(0), etc. To get their value we will use the same context members as above but will only load the lower 64 bits (remember, little-endian!):

uint64_t* p_mm0 = (uint64_t*)&context.FloatSave.RegisterArea[n*10]; // context.FltSave.FloatRegisters on x64
uint64_t mm0 = *p_mm0;

And that’s that :)

But what if the current CPU doesn’t support MMX? Well, nothing is going to explode because obviously it will still have FPU registers.

You could fiddle around with cpuid to find out whether MMX is present but there’s a much simpler solution: IsProcessorFeaturePresent(PF_MMX_INSTRUCTIONS_AVAILABLE).

XMM (aka SSE)

Now the trickiest part. I say tricky because it’s hard to find any information on it at all.

x64

In the x86-64 CONTEXT you can easily access the SSE registers because every x86-64 CPU comes with SSE2 or higher (they’re at 5 already :s) so Microsoft decided to be easy on us.

Which members do we have to access? Well, if you look at the definition of XMM_SAVE_AREA32 again, you will notice the XmmRegisters member. Bingo! There you have all of the 16 128-bit registers.

NOTE: In my copy of WinNT.h it says that ContextFlags must contain CONTEXT_MMX_REGISTERS to get those registers, but interestingly there is no such flag. Instead, use CONTEXT_FLOATING_POINT. Probably something that has to do with backward compatibility, who knows with MS.

x32

But what about good old x86? Well, there’s a huge buffer in the context struct called ExtendedRegisters. It’s a 512 byte buffer that contains all sorts of rubbish but also our SSE registers.

You know the ContextFlags drill, this time we need CONTEXT_EXTENDED_REGISTERS.

Finding info on the offsets is hard but shortly before giving up I stumbled upon the GDB source code. These peeps probably debugged the Windows internals to find that info but fortunately for us GDB is open source! To cut it short, the SSE registers can be found at offset 160 into the structure.

Note that there are only 8 SSE registers in x86, unlike x64 which has 16. This depends on the CPU mode, so a 64-bit-capable machine running in x86 mode (32-bit OS or processes running inside WoW64) will still only give you those 8 registers.

I’m not sure if the layout changes if XMM is not present but the values you retrieve won’t make much sense in that case anyway. To check for XMM we will use our old friend IsProcessorFeaturePresent again, this time with PF_XMMI_INSTRUCTIONS_AVAILABLE.

And some pseudo code for the unimaginative (yes I made up uint128_t, suit yourself):

uint128_t* p_xmmn = (uint128_t*)&context.ExtendedRegisters[(10+n)*16];
uint128_t xmmn = *p_xmmn;

BBQ

Simple if you know it, huh? I’ll leave you to endless nights of debugging your debugger :)

From → Debugging, Windows

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.