Wednesday, September 5, 2007
Tuesday, September 4, 2007
The Case of the Missing Stack Frames
Our friend Peter Hosey recently mentioned a problem that frequently makes debugging crash reports harder than usual. The problem is that stack traces don't always display every stack frame in the call chain, which can make it difficult or impossible to determine the flow of control that led to the crash. Let's look at a demonstration of the problem, its causes, and its solutions.
Update: I probably should have mentioned this before- optimizations make debugging hard. Debug builds should not use any optimizations at all, unless you enjoy headaches. With no optimizations in place, your debug builds will not suffer from missing stack frames. This post is intended to help developers who have received crash reports from users testing release builds, where optimizations have been made.
Note: I'll be using only the x86 assembly in this post. If anyone is interested, let me know and I'll post the ppc equivalents in a followup post.
A sample app:
Here's what that looks like when compiled with no optimizations(-O0):
Standard stuff- main calls func1, func1 calls func2. Each function's prolog pushes the base pointer onto the stack, saves the previous stack pointer as the new base pointer, and updates the stack pointer as needed for that function. And each function's epilog uses the 'leave' instruction to copy the base pointer back to the stack pointer, and pop the the old base pointer back into %ebp before returning control to the calling function. Because these functions perform this standard housekeeping, gdb can provide us with an accurate backtrace if we ask for it:
So far, so good. Now let's crank up the optimization level to -O2:
Now we have a problem.
Each function still has the usual prolog, but the epilogs have changed. There are no 'leave' instructions, and there's only one 'ret' instruction, in func2. gdb is now unable to give us an accurate backtrace:
The stack frames from main and func1 are no longer visible in the backtrace(_start calls main in every application, though gdb may sometimes exclude it from the backtrace).
There are two differences that we should note:
• The mechanism by which one function transfers control to another function has changed. Under -O2 optimization, 'calll' has been replaced with 'jmpl', which is sometimes referred to as a tail call optimization.
• Each function now pops the stack into %ebp before transferring control to the called function.
These differences have these results:
• The return address is not updated across function calls, because 'calll' is not used(%eip is not pushed onto the stack).
• %ebp retains its original value across function calls, because the caller pops %ebp(after it pushes %ebp in its prolog) before transferring control to the called function.
Because of these changes, gdb is unable to give us an accurate backtrace. The specific flag that causes this code generation is -foptimize-sibling-calls, which is included in -O2 and higher.
Depending on our needs, there are two ways to disable this optimization.
• The sledgehammer: -fno-optimize-sibling-calls will prevent gcc from generating the problem code across the entire executable.
• The scalpel: Inserting an extra line of code in a function will prevent gcc from generating the problem code on a per-function basis:
The asm directive tells gcc to insert whatever text we specify into the intermediate assembly file before it gets passed to the assembler. Luckily, gcc does not parse our string(s), so it has no idea what logic, if any, that we're adding. Since it appears to gcc that there is additional logic after the function call, the tail call optimization is no longer possible.The 'volatile' keyword prevents gcc from moving the asm directive from its current location in the function.
Update: The volatile keyword only prevents the compiler from moving the asm block relative to other asm blocks, not relative to other lines of C code. The above use of 'result' after the asm block is the more important detail of this code snippet. The volatile keyword is not in fact needed, and this hack is a bit more fragile than I originally thought.
There is one other optimization that can result in missing stack frames: -fomit-frame-pointer. gcc's man page doesn't seem to explain which optimization levels can include this flag, but it's easy to disable it entirely with -fno-omit-frame-pointer. Another issue with -fomit-frame-pointer is that it only results in missing stack frames on some architectures. I'll leave it to the reader to explore that one.
Update: I probably should have mentioned this before- optimizations make debugging hard. Debug builds should not use any optimizations at all, unless you enjoy headaches. With no optimizations in place, your debug builds will not suffer from missing stack frames. This post is intended to help developers who have received crash reports from users testing release builds, where optimizations have been made.
Note: I'll be using only the x86 assembly in this post. If anyone is interested, let me know and I'll post the ppc equivalents in a followup post.
A sample app:
int func2(int x)
{
return x - 1;
}
int func1(int x)
{
int result = func2(x);
return result;
}
int main(int argc, const char** argv)
{
return func1(1);
}
Here's what that looks like when compiled with no optimizations(-O0):
_func2:
+0 00001f7e 55 pushl %ebp
+1 00001f7f 89e5 movl %esp,%ebp
+3 00001f81 83ec08 subl $0x08,%esp
+6 00001f84 8b4508 movl 0x08(%ebp),%eax
+9 00001f87 83e801 subl $0x01,%eax
+12 00001f8a c9 leave
+13 00001f8b c3 ret
_func1:
+0 00001f8c 55 pushl %ebp
+1 00001f8d 89e5 movl %esp,%ebp
+3 00001f8f 83ec18 subl $0x18,%esp
+6 00001f92 8b4508 movl 0x08(%ebp),%eax
+9 00001f95 890424 movl %eax,(%esp,1)
+12 00001f98 e8e1ffffff calll _func2
+17 00001f9d c9 leave
+18 00001f9e c3 ret
_main:
+0 00001f9f 55 pushl %ebp
+1 00001fa0 89e5 movl %esp,%ebp
+3 00001fa2 83ec18 subl $0x18,%esp
+6 00001fa5 c7042401000000 movl $0x00000001,(%esp,1)
+13 00001fac e8dbffffff calll _func1
+18 00001fb1 c9 leave
+19 00001fb2 c3 ret
Standard stuff- main calls func1, func1 calls func2. Each function's prolog pushes the base pointer onto the stack, saves the previous stack pointer as the new base pointer, and updates the stack pointer as needed for that function. And each function's epilog uses the 'leave' instruction to copy the base pointer back to the stack pointer, and pop the the old base pointer back into %ebp before returning control to the calling function. Because these functions perform this standard housekeeping, gdb can provide us with an accurate backtrace if we ask for it:
Breakpoint 1, 0x00001f84 in func2 ()
(gdb) bt
#0 0x00001f84 in func2 ()
#1 0x00001f9d in func1 ()
#2 0x00001fb1 in main ()
So far, so good. Now let's crank up the optimization level to -O2:
_func2:
+0 00001f8e 55 pushl %ebp
+1 00001f8f 89e5 movl %esp,%ebp
+3 00001f91 8b4508 movl 0x08(%ebp),%eax
+6 00001f94 83e801 subl $0x01,%eax
+9 00001f97 5d popl %ebp
+10 00001f98 c3 ret
_func1:
+0 00001f99 55 pushl %ebp
+1 00001f9a 89e5 movl %esp,%ebp
+3 00001f9c 5d popl %ebp
+4 00001f9d e9ecffffff jmpl 0x00001f8e
_main:
+0 00001fa2 55 pushl %ebp
+1 00001fa3 89e5 movl %esp,%ebp
+3 00001fa5 c7450801000000 movl $0x00000001,0x08(%ebp)
+10 00001fac 5d popl %ebp
+11 00001fad e9e7ffffff jmpl 0x00001f99
Now we have a problem.
Each function still has the usual prolog, but the epilogs have changed. There are no 'leave' instructions, and there's only one 'ret' instruction, in func2. gdb is now unable to give us an accurate backtrace:
Breakpoint 1, 0x00001f91 in func2 ()
(gdb) bt
#0 0x00001f91 in func2 ()
#1 0x00001f72 in _start ()
#2 0x00001e99 in start ()
The stack frames from main and func1 are no longer visible in the backtrace(_start calls main in every application, though gdb may sometimes exclude it from the backtrace).
There are two differences that we should note:
• The mechanism by which one function transfers control to another function has changed. Under -O2 optimization, 'calll' has been replaced with 'jmpl', which is sometimes referred to as a tail call optimization.
• Each function now pops the stack into %ebp before transferring control to the called function.
These differences have these results:
• The return address is not updated across function calls, because 'calll' is not used(%eip is not pushed onto the stack).
• %ebp retains its original value across function calls, because the caller pops %ebp(after it pushes %ebp in its prolog) before transferring control to the called function.
Because of these changes, gdb is unable to give us an accurate backtrace. The specific flag that causes this code generation is -foptimize-sibling-calls, which is included in -O2 and higher.
Depending on our needs, there are two ways to disable this optimization.
• The sledgehammer: -fno-optimize-sibling-calls will prevent gcc from generating the problem code across the entire executable.
• The scalpel: Inserting an extra line of code in a function will prevent gcc from generating the problem code on a per-function basis:
int func1(int x)
{
int result = func2(x);
__asm__ volatile("");
return result;
}
The asm directive tells gcc to insert whatever text we specify into the intermediate assembly file before it gets passed to the assembler. Luckily, gcc does not parse our string(s), so it has no idea what logic, if any, that we're adding. Since it appears to gcc that there is additional logic after the function call, the tail call optimization is no longer possible.
Update: The volatile keyword only prevents the compiler from moving the asm block relative to other asm blocks, not relative to other lines of C code. The above use of 'result' after the asm block is the more important detail of this code snippet. The volatile keyword is not in fact needed, and this hack is a bit more fragile than I originally thought.
There is one other optimization that can result in missing stack frames: -fomit-frame-pointer. gcc's man page doesn't seem to explain which optimization levels can include this flag, but it's easy to disable it entirely with -fno-omit-frame-pointer. Another issue with -fomit-frame-pointer is that it only results in missing stack frames on some architectures. I'll leave it to the reader to explore that one.
Subscribe to:
Posts (Atom)