Skip to content

what is this I don't even -

LibreOffice + Java 7 = BEX

Just upgraded from LibreOffice 3.4.1 for Windows to 3.5.0 and it kept crashing during the splash screen with a buffer overflow exception (BEX) in msvcr90.dll.

Turns out the culprit is Java Runtime 7. For whatever reason LibreOffice doesn’t detect the JRE 7 during setup/startup.

On a clean install it would complain about Java missing, but with old config files it seems to do things it shouldn’t be doing. I didn’t feel like digging deeper into this, because the fix is quite simple. Let the devs figure this one out 🙂

To fix the crashes

  1. Delete the whole folder: C:\Users\{USER}\AppData\Roaming\LibreOffice
  2. Run LibreOffice, it will complain a few times about Java missing (you might have to start it twice as it shuts down the first time around)
  3. Go to Settings > LibreOffice > Java and manually add the path to the installed JRE (usually C:\Program Files\Java\jre7).
Advertisements

How to read C and C++ variable declarations

A few days ago I started wondering about the weirdness around variable and function declarations in C (and, naturally, C++). Every few months I need to look up how to declare something fancy like functions returning function pointers, but I forget the syntax the minute it compiles.

Probably quite a few people were introduced to C variable declarations like me – a small set of instructions how to get this or that variable type. For example, putting a * behind the type is how to declare a pointer. As it turns out, that’s not a coincedence, but merely a logical consequence of the way variables in C are declared.

Maybe you know all this already, at least subconsciously, but realizing the simplicity of it all helped me a great deal in reading and especially writing C.

Starting with the basics

A variable declaration in C has the following form:

[typename] [expression containing variable name]

which can be read as an assertion that if you use a variable in the same way as in the expression (right side), it will yield the specified type (left side).

This obviously is an oversimplification (e.g. square brackets allow non-fixed indices) but it’s the basic concept behind the C syntax for variable declarations.

Let’s start with the simplest example – just pretend you see something like this for the first time:

int i;

Simple enough: everytime the compiler encounters/uses our variable i, it will be of type int.

Now let’s look at a pointer to that type:

int *i; // pointer to int

This basically translates to: If you see i anywhere, and apply the dereference operator * to it, you will, again, get an int.

That’s the reason some people write int *i and others int* i. It’s a matter of preference but you’ll need the first version as soon as you have more than one pointer in a declaration:

int* i, j; // whooops! i = int pointer, j = int

I’m sure we can skip pointer to pointer and move on to arrays.

Arrays

Let’s warm up a little more with array declarations, and pointers to arrays:

int i[5]; // array of 5 int

is an array of 5 int. Here again, the rule is: If you see i anywhere, and apply the index operator [] to it, you will get an int.

The same goes for multi-dimensional arrays, for example an array of 3 times 5 int (3 rows, 5 columns) is declared

int i[3][5]; // array of 3 [ array of 5 int ]

The [] operators are applied left to right, so with the first index operator you choose any of the 3 int arrays. Applying the second operator then lets you choose out of 5 int.

Then what is

int *i[5];

? Just apply the operators backwards! Note: The index operator is applied before the dereference operator.

First, reverse the dereferentiation, that is if we apply a * we should get an int, so it’s an int pointer. Now we’re left with the square brackets, meaning it’s an array with 5 of them. An array of 5 int pointers!

How do you declare a pointer to an array then? Like this:

int (*i)[4]; // pointer to array of 4 int

We change the order the operators are applied in with the parantheses, ie. first * and then []. Reversing operations in our head again, we have something that, if [] are applied, yields an int. So, right before applying [] we should have an int array. Then what type should i have so that after applying * to it, should result in an int array? 😉

Function pointers

There’s not much more to it than that once you’ve realized that pointer or array definitions aren’t fancy ways of explicitly defining a type, but rather implicit descriptions in the form of type equality.

Logically, function declarations work the same:

int f(int, float); // function (int, float) -> int

Take any occurence of f, put a (, an int, a float and a closing ) behind it, and you’ll end up with an int.

Function pointers are essentially the same, except for the need for another indirection: we first have to dereference the pointer.

A function pointer to a function like the one above looks like this;

int (*f)(int, float); // pointer to function (int, float) -> int

All we had to change was add the *. As with pointer to array before, the () binds stronger than the * so we have to change the order using parantheses around *f so that the pointer is dereferenced first, and then the resulting function is called. If we hadn’t done that, you’d have a function returning an int pointer instead.

Let’s change the pointer to a function returning an int* The only difference is that you get the int by applying an additional * after calling the function pointer, so our definition looks just like that:

int *(*f)(int, float); // pointer to function (int, float) -> int*

About time to get where I was going with the whole article, function returning function pointer. How about a function that returns a function pointer as we declared above?

We have the same frame as above but we have to modify f to reflect the fact that there is another level of indirection. The

*f simply becomes *f():

int *(*f())(int, float); // function () -> [ pointer to function (int, float) -> int* ]

because we have to call f first to get the function pointer, then we dereference that pointer, then call it, and finally apply * to get the int.

And a pointer to that function we just declared? Just turn the f into (*f) to dereference the pointer, the rest is the same:

int *(*(*f)())(int, float); // pointer to function () -> [ pointer to function (int, float) -> int* ]

The cure for the common break fest

As you can see, things get very tricky very fast, so you shouldn’t be doing this for anything beyond 1 or 2 levels of indirection.

That’s where typedefs come in: they allow you to express a complicated type with a single, new typename. The syntax is straight-forward:

typedef [typename] [expression containing new typename]

Looks similar to variable declarations. Basically all you have to do is declare a variable named x, put typedef before the declaration and you get a new type x that acts like any variable declared like this.

A small example:

int (*a1)[10] = 0; // pointer to array of 10 int
typedef int (*ax)[10]; // definition of new type ax
ax a2 = 0;
a1 = a2; // this works, a1 and a2 are of the same type

This makes the whole business a lot easier than hacking together the declarations every single time. Take for example this small snippet:

// returns pointer to int
int *func2(int a, float b)
{
        static int x = 0;
        return &x;
}

// returns pointer to function returning pointer to int
int *(*func1())(int, float)
{
        return &func2;
}

int main()
{
        int *(*(*ptr1)())(int, float) = &func1; // pointer to function returning pointer to function returning pointer to int
        int *(*ptr2)(int, float) = (*ptr1)(); // pointer to function returning pointer to int

        int val = *(*ptr2)(0, 0.0f);

        printf("%i\n", val);
        return 0;
}

Those are the same declarations as above, but deciphering and especially writing them from scratch takes time and is fairly error prone.

The same code using typedef, a lot easier on the eye:

int *func2(int a, float b)
{
        static int x = 0;
        return &x;
}

// func2_type == pointer to function (int, float) -> int*
typedef int *(*func2_type)(int, float);

func2_type func1()
{
        return &func2;
}

// func1_type == pointer to function () ->func2_type
typedef func2_type (*func1_type)();

int main()
{
        func1_type ptr1 = &func1;
        func2_type ptr2 = (*ptr1)();

        int val = *(*ptr2)(0, 0.0f);

        printf("%i\n", val);
        return 0;
}

You only have to do the work once, and simply use the new type as a return type of functions. Things become a lot more readable and maintainable.

Finito

That’s it for now. I hope this helps a few people get a better understanding of the C syntax, even if this probably isn’t new to most intermediate and expert C programmers.

Either way it was fun to write up and test. If you have any questions or feedback, let me know.

FPU, MMX, XMM and BBQ

So you wrote a debugger with the Windows API, everything is working fine and dandy but then you want to access the x87 FPU registers. If you’re totally insane you’ll even ask me how to access the MMX and SSE register.

Now, you wouldn’t be reading this if I didn’t tell you in the next few paragraphs, so read on.

I will be assuming you know how to use GetContextThread and have a rough idea what is or should be in a CONTEXT struct.

FPU

x32

To get the FPU registers delivered with your CONTEXT struct, the context.ContextFlags has to contain CONTEXT_FLOATING_POINT. After calling GetThreadContext, you can access context.FloatSave which is a struct of type FLOATING_SAVE_AREA and is defined as such:

typedef struct _FLOATING_SAVE_AREA {
    DWORD   ControlWord;
    DWORD   StatusWord;
    DWORD   TagWord;
    DWORD   ErrorOffset;
    DWORD   ErrorSelector;
    DWORD   DataOffset;
    DWORD   DataSelector;
    BYTE    RegisterArea[80];
    DWORD   Cr0NpxState;
} FLOATING_SAVE_AREA;

The field we are interested in is RegisterArea, a buffer of 80 bytes – 10 bytes for each of the 8 FPU registers. How do you access ST(n) in a byte buffer? Simple: &context.FloatSave.RegisterArea[n*10] is the start of the 10-byte buffer for that register.

x64

If you’re already dancing and celebrating because it will all compile nicely on both x86 and x64 I have to disappoint you. Microsoft decided to slightly change the layout of the CONTEXT struct for the x86-64 architecture. Instead of FloatSave, you have to access FltSave (what genius decided to drop the ‘oa’?), a structure of type XMM_SAVE_AREA32:

typedef struct _XMM_SAVE_AREA32 {
    WORD   ControlWord;
    WORD   StatusWord;
    BYTE  TagWord;
    BYTE  Reserved1;
    WORD   ErrorOpcode;
    DWORD ErrorOffset;
    WORD   ErrorSelector;
    WORD   Reserved2;
    DWORD DataOffset;
    WORD   DataSelector;
    WORD   Reserved3;
    DWORD MxCsr;
    DWORD MxCsr_Mask;
    M128A FloatRegisters[8];
    M128A XmmRegisters[16];
    BYTE  Reserved4[96];
} XMM_SAVE_AREA32, *PXMM_SAVE_AREA32;

Looks similar, doesn’t it? This time we have to access the FloatRegisters member, an array of 8 registers. Each value is 128 bit wide but really only the lowest 80 bits are valid and represent the state of the corresponding FPU register.

80 bits != 64 bits

If your compiler supports 80 bit floating point values (long double in GCC), you’re pretty much done here – just load the value from the buffer and manipulate away! However, if your compiler decided not to implement or drop support for 80-bit float values (like VC++), you’ll run into a small problem. How do you load an 80-bit float without a native data type to represent it?

You could use inline assembler, sure. Something as simple as this should do (beware, pseudo code):

FLD TBYTE [&context.FloatSave.RegisterArea[n*10]] // context.FltSave.FloatRegisters on x64
FSTP QWORD [&myFloat64]

Sadly, this won’t work with VC++ since there is no inline assembler anymore when compiling for 64-bits. So we have to go for something more general, namely a function that will convert a buffer of 10 bytes into a double. The IEEE floating-point standard is pretty straight forward, and after googling for usable source code, a nice few minutes with the Intel manual, some time spent coding and testing and a lot of coke (the drink!) we end up with something like this:

#include <limits>
#include <cmath>
double readFloat80(const uint8_t buffer[10])
{
    //80 bit floating point value according to IEEE-754:
    //1 bit sign, 15 bit exponent, 64 bit mantissa

    const uint16_t SIGNBIT    = 1 << 15;
    const uint16_t EXP_BIAS   = (1 << 14) - 1; // 2^(n-1) - 1 = 16383
    const uint16_t SPECIALEXP = (1 << 15) - 1; // all bits set
    const uint64_t HIGHBIT    = (uint64_t)1 << 63;
    const uint64_t QUIETBIT   = (uint64_t)1 << 62;

    // Extract sign, exponent and mantissa
    uint16_t exponent = *((uint16_t*)&buffer[8]);
    uint64_t mantissa = *((uint64_t*)&buffer[0]);

    double sign = (exponent & SIGNBIT) ? -1.0 : 1.0;
    exponent   &= ~SIGNBIT;

    // Check for undefined values
    if((!exponent && (mantissa & HIGHBIT)) || (exponent && !(mantissa & HIGHBIT))) {
        return std::numeric_limits<double>::quiet_NaN();
    }

    // Check for special values (infinity, NaN)
    if(exponent == 0) {
        if(mantissa == 0) {
            return sign * 0.0;
        } else {
            // denormalized
        }
    } else if(exponent == SPECIALEXP) {
        if(!(mantissa & ~HIGHBIT)) {
            return sign * std::numeric_limits<double>::infinity();
        } else {
            if(mantissa & QUIETBIT) {
                return std::numeric_limits<double>::quiet_NaN();
            } else {
                return std::numeric_limits<double>::signaling_NaN();
            }
        }
    }

    //value = (-1)^s * (m / 2^63) * 2^(e - 16383)
    double significand = ((double)mantissa / ((uint64_t)1 << 63));
    return sign * ldexp(significand, exponent - EXP_BIAS);
}

You will of course lose some precision (we dropped 2 bytes!) but so will you do when interfacing the FPU via ASM. Please note that the code only works on little-endian systems and expects the 80-bit float in the buffer to be stored in little-endian, too.

I didn’t bother with performing the conversion the other way around, but if you understood what the code above does, you won’t have trouble implementing that yourself. It’s basically just a matter of shifting around bits 🙂

MMX

MMX is very simple once you realize that the MMX registers are just aliases to the low 64-bits (the mantissa) of the FPU registers, e.g. MM0 <-> ST(0), etc. To get their value we will use the same context members as above but will only load the lower 64 bits (remember, little-endian!):

uint64_t* p_mm0 = (uint64_t*)&context.FloatSave.RegisterArea[n*10]; // context.FltSave.FloatRegisters on x64
uint64_t mm0 = *p_mm0;

And that’s that 🙂

But what if the current CPU doesn’t support MMX? Well, nothing is going to explode because obviously it will still have FPU registers.

You could fiddle around with cpuid to find out whether MMX is present but there’s a much simpler solution: IsProcessorFeaturePresent(PF_MMX_INSTRUCTIONS_AVAILABLE).

XMM (aka SSE)

Now the trickiest part. I say tricky because it’s hard to find any information on it at all.

x64

In the x86-64 CONTEXT you can easily access the SSE registers because every x86-64 CPU comes with SSE2 or higher (they’re at 5 already :s) so Microsoft decided to be easy on us.

Which members do we have to access? Well, if you look at the definition of XMM_SAVE_AREA32 again, you will notice the XmmRegisters member. Bingo! There you have all of the 16 128-bit registers.

NOTE: In my copy of WinNT.h it says that ContextFlags must contain CONTEXT_MMX_REGISTERS to get those registers, but interestingly there is no such flag. Instead, use CONTEXT_FLOATING_POINT. Probably something that has to do with backward compatibility, who knows with MS.

x32

But what about good old x86? Well, there’s a huge buffer in the context struct called ExtendedRegisters. It’s a 512 byte buffer that contains all sorts of rubbish but also our SSE registers.

You know the ContextFlags drill, this time we need CONTEXT_EXTENDED_REGISTERS.

Finding info on the offsets is hard but shortly before giving up I stumbled upon the GDB source code. These peeps probably debugged the Windows internals to find that info but fortunately for us GDB is open source! To cut it short, the SSE registers can be found at offset 160 into the structure.

Note that there are only 8 SSE registers in x86, unlike x64 which has 16. This depends on the CPU mode, so a 64-bit-capable machine running in x86 mode (32-bit OS or processes running inside WoW64) will still only give you those 8 registers.

I’m not sure if the layout changes if XMM is not present but the values you retrieve won’t make much sense in that case anyway. To check for XMM we will use our old friend IsProcessorFeaturePresent again, this time with PF_XMMI_INSTRUCTIONS_AVAILABLE.

And some pseudo code for the unimaginative (yes I made up uint128_t, suit yourself):

uint128_t* p_xmmn = (uint128_t*)&context.ExtendedRegisters[(10+n)*16];
uint128_t xmmn = *p_xmmn;

BBQ

Simple if you know it, huh? I’ll leave you to endless nights of debugging your debugger 🙂

0xEditbox – enforcing hex input

For editboxes there’s a style ES_NUMBER which only allows decimal digits as input. Recently I needed that sort of functionality for hex digits. I also wanted the editbox to control text being posted from the clipboard and limit its size automatically according to the size of the variable it holds.

I ended up coding my own class template which subclasses the editbox to a window procedure that monitors all WM_CHAR, WM_PASTE and WM_SETTEXT messages. If the user tries to enter an invalid character it beeps and immediately returns without passing the message to the old window procedure.

Values are retrieved/updated using the member functions Get which returns a value of the type specified in the template instance and Set which takes a value of this type. You can additionally specify if output is zerofilled, which can be used to temporarily override the flag set in the constructor.

Usually you would declare it in your window/dialog procedure and create an instance once the edit box was created (e.g. WM_INITDIALOG or WM_CREATE after creating all child windows):

INT_PTR CALLBACK DlgHandler(HWND Window, UINT Message, WPARAM wParam, LPARAM lParam)
{
static cHexEdit * AwesomeEdit, * AmazingEdit;
  switch(Message)
  {
  case WM_INITDIALOG:
    AwesomeEdit = new cHexEdit(GetDlgItem(Window, EdAwesome));
    AmazingEdit = new cHexEdit(GetDlgItem(Window, EdAmazing), true);
    break;
  case WM_COMMAND:
    switch(HIWORD(wParam))
    {
    case BTN_DOIT:
      DWORD Input, Output;
      Input = AwesomeEdit->Get();
      Output = Superfunkshun(Input);
      AmazingEdit->Set(Output);
      break;
    }
    break;
  case WM_DESTROY:
    delete AwesomeEdit;
    delete AmazingEdit;
    break;
  }
}

The maximum input size is limited by the type used in the template instance. If you specify BYTE, it allows up to 2 digits, 4 for WORD, 8 for DWORD, etc.

I wrote the conversion functions myself as didn’t want to deal with having to fiddle with format specifiers for sprintf and sscanf. Theoretically there’s no limit to the variable size. However, you can’t use floating point types. The conversion would probably work but the digit calculation is wrong, nothing that can’t be fixed though I guess 😉

I figured I’d put this here in case someone else needs something similar:

cHexEdit.h

SetThreadContext Fail on x64

On all 64-bit versions of Windows I have been experiencing weird crashes with apps running in a debugger, caused by all sorts of access violations in ntdll. I figured it must have been my code doing something incorrectly which works on x86 but breaks on 64-bit Windows.

The exceptions would usually happen after the break at ntdll!DbgBreakPoint, somewhere around RtlUserThreadBase on Vista SP2. I even got DEP access violations at random memory addresses outside of ntdll. This behaviour seemed to be consistent on all 64-bit Windows versions. I got similar exceptions on XP SP2, Vista SP2 and Win7 RC1.

I was able to nail it down to a call to SetThreadContext which was responsible for setting the DRX registers. The code did nothing more than call GetThreadContext, set DR0 to DR3 + DR7 and call SetThreadContext. I suspected the DRX values to be corrupted, so I removed everything between the two APIs. Surprisingly, the error remained. When I just removed SetThreadContext, it worked flawlessly.

Since I didn’t need to read and update anything other than the debug registers, I set the context flags to CONTEXT_DEBUG_REGISTERS. Because I lacked any logical solutions I played around with the flags and tried CONTEXT_ALL. All of a sudden the exceptions were gone and the debuggee started up normally. Wow.

Turns out that combining CONTEXT_DEBUG_REGISTERS with any other valid flag fixes this problem.

Just to be sure, I added this piece of code to the CREATE_PROCESS_DEBUG_EVENT handler:

CONTEXT Context;
Context.ContextFlags = CONTEXT_DEBUG_REGISTERS;
HANDLE Thread = Event.u.CreateProcessInfo.hThread;
GetThreadContext(Thread, &Context);
SetThreadContext(Thread, &Context);

This perfectly reproduced all the crashes I was experiencing before. Changing the second line to:

Context.ContextFlags = CONTEXT_CONTROL | CONTEXT_DEBUG_REGISTERS;

made it work like a charm.

I had another bunch of DEP violations after every INT1 but didn’t realize it was the same problem. I forgot to adapt the flags for one piece of code which read and updated the debug registers after a hardware breakpoint or single step. After fixing that, I had a perfectly working debugger.

I can’t remember how much time I spent finding that stupid bug. Bill, you owe me big-time.

Hardware breakpoints on Windows

The Intel documentation on DRX breakpoints is at times confusing and complicated and (obviously) doesn’t really give a clue what the implementation on Windows looks like. This article hopefully clears up some of the issues you might be faced with while implementing hardware breakpoints in your debugger/application.

Note: this is not an introduction to the topic, nor to be used as a reference. For solid information get a copy of the “Intel 64 and IA-32 Architectures Software Developer’s Manual“, specifically Volume 3B (“System Programming Guide, Part 2”), Chapter 18 (“Debugging and Performance Monitoring”).

Local only

On Windows, only local hardware breakpoints work, ie. they only exist in a thread’s context, not system wide. Kinda makes sense considering each thread has its own set of registers. Global hardware breakpoints aren’t global to the process, but global to the whole system.

Windows will make sure you don’t enable system-wide breakpoints, at least more recent versions.

Basics

When a hardware breakpoint is hit, an EXCEPTION_SINGLE_STEP exception is raised. This is kind of confusing at first, but since both DRX breakpoints and single step are so called “debug exceptions” (which translates to INT1) it is the same type of exception.

You differentiate between the two by examining DR6. It’s possible for both to be reported in one event.

According to the Intel manual you have to clear DR6 after every debug exception. However, it looks like Windows does that for us. I would recommend clearing DR6 yourself though, because I’m not sure that this is guaranteed on all Windows versions (at least XP SP3 and Vista SP2 do clear it).

If the conditions specified in DR7 are fullfilled and one the breakpoints is triggered, you receive an EXCEPTION_SINGLE_STEP_EVENT. It’s possible that the exception is raised because of several simultaneous breakpoints. You will get seperate events for single step and hardware breakpoints on execution. This is not the case for single step and breakpoint on read/write and with breakpoints of the same type. The safe way is to examine all bits in DR6 and act accordingly. If 2 breakpoints are identical, DR6 reports both. Be aware that the bits for a breakpoint are set in DR6 even if the breakpoint is not enabled in DR7. To be perfectly sure, also check if the enable bit is set in DR7.

Since breakpoints on read/write are trap exceptions (ie. they break after the exception that caused the breakpoint to trigger) you can just continue execution. Breakpoints on execution are faults, they break before the instruction is executed. If you would just continue to execute, the breakpoint would trigger again as it executes the same address again and cause the breakpoint to trigger.

The fairly easy solution is to set the resume flag in the EFlags register (bit 16). It temporarily disables breakpoints on execution for one instruction. It doesn’t affect any other type of breakpoints.

If only it was that easy…

There is only one drawback which made me avoid this. VMWare (Workstation 6.5) seems to ignore or disable the trap flag so that you’ll never get past this instruction. This might be a minor issue to some people but many use VMWare for reverse engineering or just for running stuff you don’t necessarily want to run on your normal box.

I’m not sure if Virtual Box or other VMs are affected though.

Single step to the rescue!

Now, what else can we do? The only feasible solution is to disable the breakpoint in DR7, set the trap flag to break on the next instruction and re-enable it after the single step exception.

However, be aware that there might be more than 1 breakpoint you have to re-enable after a single step event.

Okay, summarizing, here’s some pseudo code:

// ... Awesome debug loop goes here ...
switch(Info->ExceptionRecord.ExceptionCode)
{
case EXCEPTION_SINGLE_STEP:
{
  GetContext(CONTEXT_DEBUG_REGISTERS);
  cDR6 DR6(Context.Dr6);
  cDR7 DR7(Context.Dr7);
  Context.Dr6 = DR6.Clear();
  SetContext();

  // Singlestep after last breakpoint(s)
  if(DR6.SingleStep)
  {
    if(CurBreaks.size()) // Re-enable breakpoints
    {
      do{
        CurBreaks.top()->Enable();
        CurBreaks.pop();
      }
      while(CurBreaks.size());
      Handled = true;
    }
  }
  // DRX breakpoints
  for(int i = 0; i < 4; i++)
  {
    if(DR6.DRX[i] && DR7.DRX[i].Enabled)
    {
      // You might want to check if this is a breakpoint you actually set
      CurBreak = cHardwareBreakpoints::Find(i);

      // ... Do whatever notification here ...

      if(DR7.DRX[i].Type == BpHwOnExec)
      {
        //SetResumeFlag(true); // Doesn't work in VMWare
        CurBreak->Disable();
        CurBreaks.push(CurBreak);
        SetSingleStep(true);
      }
      Handled = true;
    }
  }
}
break;
}

If you don’t mind breaking VMWare compatibility, you can remove the whole

if(DR6.SingleStep)

block, uncomment

SetResumeFlag(true);

and remove anything else inside that block.

What else?

A few things worth noting:

To enable a hardware breakpoint process-wide, you have to set the debug registers in every spawned thread.

If you temproarily disable the breakpoint, you only have to disable it in the current thread. Other threads still have their breakpoints enabled and break. This is different from, say, INT3 breakpoints which are enabled or disabled in the whole process (they require changed code after all).

I guess that’s it for the moment. I didn’t want to blow this up by showing how to enable breakpoints in DR7 or parsing DR6. I might post some source for doing that if I get round to cleaning up the code. For the time being, the Intel manual is your best friend 🙂

OutputDebugString awesomeness

A program can communicate with a debugger by sending strings to it through the OutputDebugString API.

What happens then is that a ring3 debugger using the Windows Debug API will receive an OUTPUT_DEBUG_STRING_EVENT. Along with it, it receives relevant information about the debug string inside an OUTPUT_DEBUG_STRING_INFO struct.

First of all, some trivia:

Unicode? Yes? No?

OUTPUT_DEBUG_STRING_INFO contains a member called fUnicode:

fUnicode

The format of the debugging string. If this member is zero, the debugging string is ANSI; if it is nonzero, the string is Unicode.

Let’s look at what MSDN says about OutputDebugString:

OutputDebugStringW converts the specified string based on the current system locale information and passes it to OutputDebugStringA to be displayed. As a result, some Unicode characters may not be displayed correctly.

So we can’t actually receive any unicode debug strings. I assume this is a remainder from pre NT times as it’s officially unused according to the quote above.

Now back to the Debug API:

Exception? Yes? No?

To continue debugging, the debugger calls ContinueDebugEvent for that event. You additionally have to specify the continue status which can be either DBG_CONTINUE or DBG_EXCEPTION_NOT_HANDLED.

According to MSDN there is no difference between the two for events other than EXCEPTION_DEBUG_EVENT. Unfortunately that’s not right. There is another debug event for which it does make a difference. Yes you guessed correctly, a debug string event 😀

OUTPUT_DEBUG_STRING_EVENT behaves just like an exception event, so that if you continue with DBG_EXCEPTION_NOT_HANDLED, it sends the event a second time, just like a second chance exception.

Recently I’ve been wondering why I got 2 debug string events for each call to OutputDebugString from the debuggee. I was looking for all sorts of weird solutions until I found out that returning DBG_CONTINUE did the trick.

Reason? Yes? No?

Internally, a debug string is handled as an exception. OutputDebugString calls RaiseException with DBG_PRINTEXCEPTION_C (defined as 0x40010006) and the string address and size as the exception parameters. To keep this transparent to the debugger, Windows effectively swallows the exception and notifies the debugger of it through the debug string notification.

All this is done inside DbgUiConvertStateChangeStructure inside the debugger which sets up and translates all the debug notification stuff.

This is even used as anti-debug code. You just set up an exception handler and call RaiseException with DBG_PRINTEXCEPTION_C. If a debugger is present the exception is swallowed, ie. if your exception handler does not get called, you’re being debugged.

As to why this is not affected by the continue status, I have no clue. It can, however, be fixed by patching DbgUiConvertStateChangeStructure to handle it like any other exception.

For pseudo source code for DbgUiConvertStateChangeStructure, you can take a look at OpenRCE. You can also see how the fUnicode member is set to false by default.

Bottom line, always continue with DBG_CONTINUE and ignore the fUnicode member 🙂