Tuesday, September 4, 2007

The Case of the Missing Stack Frames

Our friend Peter Hosey recently mentioned a problem that frequently makes debugging crash reports harder than usual. The problem is that stack traces don't always display every stack frame in the call chain, which can make it difficult or impossible to determine the flow of control that led to the crash. Let's look at a demonstration of the problem, its causes, and its solutions.

Update: I probably should have mentioned this before- optimizations make debugging hard. Debug builds should not use any optimizations at all, unless you enjoy headaches. With no optimizations in place, your debug builds will not suffer from missing stack frames. This post is intended to help developers who have received crash reports from users testing release builds, where optimizations have been made.

Note: I'll be using only the x86 assembly in this post. If anyone is interested, let me know and I'll post the ppc equivalents in a followup post.

A sample app:

int func2(int x)
return x - 1;

int func1(int x)
int result = func2(x);
return result;

int main(int argc, const char** argv)
return func1(1);

Here's what that looks like when compiled with no optimizations(-O0):

+0 00001f7e 55 pushl %ebp
+1 00001f7f 89e5 movl %esp,%ebp
+3 00001f81 83ec08 subl $0x08,%esp
+6 00001f84 8b4508 movl 0x08(%ebp),%eax
+9 00001f87 83e801 subl $0x01,%eax
+12 00001f8a c9 leave
+13 00001f8b c3 ret

+0 00001f8c 55 pushl %ebp
+1 00001f8d 89e5 movl %esp,%ebp
+3 00001f8f 83ec18 subl $0x18,%esp
+6 00001f92 8b4508 movl 0x08(%ebp),%eax
+9 00001f95 890424 movl %eax,(%esp,1)
+12 00001f98 e8e1ffffff calll _func2
+17 00001f9d c9 leave
+18 00001f9e c3 ret

+0 00001f9f 55 pushl %ebp
+1 00001fa0 89e5 movl %esp,%ebp
+3 00001fa2 83ec18 subl $0x18,%esp
+6 00001fa5 c7042401000000 movl $0x00000001,(%esp,1)
+13 00001fac e8dbffffff calll _func1
+18 00001fb1 c9 leave
+19 00001fb2 c3 ret

Standard stuff- main calls func1, func1 calls func2. Each function's prolog pushes the base pointer onto the stack, saves the previous stack pointer as the new base pointer, and updates the stack pointer as needed for that function. And each function's epilog uses the 'leave' instruction to copy the base pointer back to the stack pointer, and pop the the old base pointer back into %ebp before returning control to the calling function. Because these functions perform this standard housekeeping, gdb can provide us with an accurate backtrace if we ask for it:

Breakpoint 1, 0x00001f84 in func2 ()
(gdb) bt
#0 0x00001f84 in func2 ()
#1 0x00001f9d in func1 ()
#2 0x00001fb1 in main ()

So far, so good. Now let's crank up the optimization level to -O2:

+0 00001f8e 55 pushl %ebp
+1 00001f8f 89e5 movl %esp,%ebp
+3 00001f91 8b4508 movl 0x08(%ebp),%eax
+6 00001f94 83e801 subl $0x01,%eax
+9 00001f97 5d popl %ebp
+10 00001f98 c3 ret

+0 00001f99 55 pushl %ebp
+1 00001f9a 89e5 movl %esp,%ebp
+3 00001f9c 5d popl %ebp
+4 00001f9d e9ecffffff jmpl 0x00001f8e

+0 00001fa2 55 pushl %ebp
+1 00001fa3 89e5 movl %esp,%ebp
+3 00001fa5 c7450801000000 movl $0x00000001,0x08(%ebp)
+10 00001fac 5d popl %ebp
+11 00001fad e9e7ffffff jmpl 0x00001f99

Now we have a problem.

Each function still has the usual prolog, but the epilogs have changed. There are no 'leave' instructions, and there's only one 'ret' instruction, in func2. gdb is now unable to give us an accurate backtrace:

Breakpoint 1, 0x00001f91 in func2 ()
(gdb) bt
#0 0x00001f91 in func2 ()
#1 0x00001f72 in _start ()
#2 0x00001e99 in start ()

The stack frames from main and func1 are no longer visible in the backtrace(_start calls main in every application, though gdb may sometimes exclude it from the backtrace).

There are two differences that we should note:

• The mechanism by which one function transfers control to another function has changed. Under -O2 optimization, 'calll' has been replaced with 'jmpl', which is sometimes referred to as a tail call optimization.
• Each function now pops the stack into %ebp before transferring control to the called function.

These differences have these results:

• The return address is not updated across function calls, because 'calll' is not used(%eip is not pushed onto the stack).
• %ebp retains its original value across function calls, because the caller pops %ebp(after it pushes %ebp in its prolog) before transferring control to the called function.

Because of these changes, gdb is unable to give us an accurate backtrace. The specific flag that causes this code generation is -foptimize-sibling-calls, which is included in -O2 and higher.

Depending on our needs, there are two ways to disable this optimization.

• The sledgehammer: -fno-optimize-sibling-calls will prevent gcc from generating the problem code across the entire executable.
• The scalpel: Inserting an extra line of code in a function will prevent gcc from generating the problem code on a per-function basis:

int func1(int x)
int result = func2(x);
__asm__ volatile("");
return result;

The asm directive tells gcc to insert whatever text we specify into the intermediate assembly file before it gets passed to the assembler. Luckily, gcc does not parse our string(s), so it has no idea what logic, if any, that we're adding. Since it appears to gcc that there is additional logic after the function call, the tail call optimization is no longer possible. The 'volatile' keyword prevents gcc from moving the asm directive from its current location in the function.

Update: The volatile keyword only prevents the compiler from moving the asm block relative to other asm blocks, not relative to other lines of C code. The above use of 'result' after the asm block is the more important detail of this code snippet. The volatile keyword is not in fact needed, and this hack is a bit more fragile than I originally thought.

There is one other optimization that can result in missing stack frames: -fomit-frame-pointer. gcc's man page doesn't seem to explain which optimization levels can include this flag, but it's easy to disable it entirely with -fno-omit-frame-pointer. Another issue with -fomit-frame-pointer is that it only results in missing stack frames on some architectures. I'll leave it to the reader to explore that one.


Peter Hosey said...

Thanks. ☺

Blake C. said...

Cheers - thanx for the inspiration :)

Anonymous said...

Very good to know! However, if you have control over the code like that are are trying to debug a problem, why not just kick it into debug mode with no optimizations? That's why there is a difference between debug and release mode. It's assumed that you basically can't debug that kind of stuff in release mode because of optimizations, which are there for a reason.

Peter Hosey said...

Anonymous: Because you can't be sure that the problem will still exist in debug mode. If it only manifests in a release, a debug build won't help.

That said, this flag (-fno-optimize-sibling-calls) should certainly be used in debug builds.

Blake C. said...

Peter, my opinion is that debug builds should include no optimizations at all, since most optimizations are known to make debugging harder to some degree. If the default setting for debug configurations is anything other than -O0, I would consider that a bug.

Peter Hosey said...

Blake: Of course. But we don't ordinarily ship debug builds; we ship release builds. And if only the release build exhibits the bug, then we have to suck it up and debug the app with the (release-build) optimizations in place.

Blake C. said...

I don't know why I didn't catch this before, but your suggestion to use -fno-optimize-sibling-calls in debug builds may not be necessary, because I would expect -O0 to disable all optimizations, including this one. I haven't proven this, but it stands to reason.

Peter Hosey said...

Blake C.: Makes sense.