Hell, it seems that the Apple incident inspired me after all and I got our SBP2 bug down :-)
That was a tough nut to crack. The SBP2 driver was failing with a sample 4TB external hard drive, actually giving me a BSOD. However this was not a nice BSOD like most others. The kernel crashed because it detected a corrupt doubly linked list, and of course I had no idea where the corruption came from.
After cleaning up a lot of stuff I found the culprit... just by looking at the debug messages. We had a failed kernel mode sanity check that led to a buffer overrun. However the buffer was at the end of a struct thus overwriting the next struct and leading to crazy behaviour when the OTHER struct was getting used, much later on.
Anyway, to cut a long story short the buffer should be 16 bytes instead of 12 bytes. I changed that and looked carefully through the code, then stepped through the initialization code and all looked nice. Sure enough there was no crash any more, however nothing else worked either :-D
So I embarked on a journey to find what the heck could be causing this and here is my finding.
Suppose you have a struct like this:
struct ORB_DATA
{
// Some members here...
// At the VERY END of the ORB_DATA struct.
struct ORB_CMD orb;
};
NEVER, EVER ASSUME that:
(sizeof(ORB_DATA) - sizeof(ORB_CMD)) == FIELD_OFFSET(ORB_DATA, orb)
The compiler may align things as it sees fit, so just use FIELD_OFFSET and leave the neat tricks alone.
Have fun!
Dimitris Staikos
Thanks for posting this!
Posted by: coder | June 28, 2010 at 05:02 AM