November 30, 2008

Tips on writing C macros

Macros in C/C++ is an extremely powerful feature, basically one I can't live without. The use of macros has been widely debated and mostly they are labeled as 'evil', as a no-no. This is the usual moral dillema that shows up whenever we have something with significant power in our hands: Anything that is powerful enough can be misused one way or the other. Should we then allow the usage of the powerful tool or ban it altogether so as to prevent its misuse (intentional or unintentional)?

Continue reading "Tips on writing C macros" »

November 18, 2008

The code is THE documentation

Yeah sure... Sleep tight and noone is gonna rock your boat, at least while you are sleeping in it.

This is a very luring statement. The reason it is so luring is that it is so obviously true. Or is it not? Doesn't the program do exactly what the code tells it too? So the code is THE one and only always up-to-date documentation of any program. Right?

It is so convincing that you just can't resist falling for it, especially when faced with an aficionado that passionately claims that she doesn't need to write comments as "extra" documentation, because the code is THE documentation, and since she obviously writes top quality code, comments are simply redundant. A waste of screen estate more or less. Useless clutter around the actual documentation.

Sorry to break the news boys and girls... The reality is somehow different.

The code is the imperfect translation into a programming language of the programmer’s imperfect understanding about what the program should do.

You can figure out the rest. The code is THE documentation of the programmer's imperfect language translation of an imperfect understanding. It documents how the programmer failed to code properly what he failed to understand properly.

Anyone still battling on the issue of whether the code is THE documentation and whether comments (in any form) are needed or not, most likely hasn’t worked on a big project (300+KLOC) long enough (5+ years) to figure things out for him/herself.

Life is so sweet when 90% of the time you fool around your own code. Everything is so simple, because everything is in your head. However once you start spending 50% of your time on pieces of code that other people wrote, pieces of code you have NEVER seen before, or pieces of your own code that you wrote 4 years ago and haven't glanced at since, then life is not as sweet.

When you read code you are not familiar with, then you are doing the reverse translation, from imperfect programming language to imperfect understanding about the program's intentions. This is tough. Some may be better at it than others, but trust me... without good comments it can be much more painful and error prone than it has to.

So please, make others' lives easier. Let them cherish your memory after you are gone and enjoy fixing the bugs in your code. Because you WILL be gone some day and your code WILL still have bugs that need fixing.

Enjoy,
Dimitris Staikos

October 31, 2008

Sloppy sizeof Programming (Variation #2)

This is another variation on my previous sloppy programming post.

When something potentially dangerous is displayed on TV then we hear them say "Do try this at home".

Well for this kind of sloppy programming I say "Don't try this at work!".
Only try it at home, where you can merrily shoot yourself (and just yourself) in the foot as much as your heart desires, leaving the rest of the world in blissful peace.

Continue reading "Sloppy sizeof Programming (Variation #2)" »

October 24, 2008

How violating KISS can end up in 100% CPU usage by your antivirus

Hi all,

I have been using an antivirus/firewall product on 4 of my computers, 2 desktops and 2 laptops, for more than a year without any troubles and was more than happy for the money I paid for it.

However at some moment I noticed that the my IBM ThinkPad X41 was extremely slow. Process Explorer would show CPU utilization being constantly at 100% and the culprit was the executable of the antivirus software.

If I disabled the antivirus real-time protection then CPU went down to normal. When I reenabled it then back to 100%.

To start with I left the laptop open for a couple of days so maybe the antivirus would get done with something it was doing but of course that was a silly thought without results. Then I tried to uninstall and reinstall it but this was just another silly thought and of course the problem remained.

So I decided to use the artillery. I opened up Process Monitor by SysInternals just to take a look at what the antivirus was doing.

Suprise!

It was scanning over and over a 9MB html file!

The location of this file?
C:\Program Files\ThinkPad\ConnectUtilities and the name of it AddConnAdvanced.html

So I added this file to the exclusion list and my laptop became normal again.

I thought maybe my laptop downloaded some software update that screwed things up, so I decided to look at what could a 9MB html file possibly hold...

Surprise (again)!

It was the IBM diagmostics utility debug log...

Then I remembered... I had a problem with the wifi on the laptop about two months ago. So I turned on the debug diagnostics and of course forgot it on even after I solved the problem.

It seems that the IBM developers wanted to appear slick I guess, so a .TXT file was not good enough for them but instead they output an HTML file which eventually gets huge, it is an HTML file so it gets scanned by adivirus software because it might be malicious, ending up in a frustrated user >:( .
Bravo! Way to go!

I do device driver development for a living and as a result I have a "lean and mean" mentality, some also call it KISS (Keep It Simple Stupid). I just hate it when people use HTML/XML for something that could be done with a plain txt, or use .NET for a config utility because they want to add a silly jpg on the dialog and don't know how to do this in plain Win32.

Things are simple: When you do more than you absolutely have to, then you increase your application's "problem surface". More and more things can go wrong and you DON'T want things to go wrong with low level stuff like debug logs and config utilities.

To understand the extent of this problem, that is doing more than you have to just because it is cool, be amused to know that some PhD guys that obviously hadn't do much programming in their lives, designed a camera specification called Genicam, and designed a feature by which the camera could send to the application a file that describes in a standard format all of its settings. This "standard" format of course was XML and everybody was so happy and cool. Of course the time came to write a device driver for such a camera and guess what? How on earth do you fit an XML parser in kernel mode code? Only if you want to play cowboy and device driver developers usually don't have much free time to play cowboy. Using XML, which is user mode crap, for some simple text information that NEVER changes is against KISS.

Dude, do I hear someone saying "but what about schema validation etc"??? We are not talking about a file that got created by a user. We are talking about a file that gets stored inside a camera by the manufacturer. IT BETTER BE IN THE CORRECT FORMAT.

Anyway, although the antivirus is not to blame for my troublesome situation, but it would help if they provided some statistics screen/report with the most scanned files so we can solve such problems in an easier way, without using super natural powers.


Tip: If a similar thing happens to your PC and it is so slow that it is almost unusable because a process is taking 100% CPU and you don't want to kill the process (so that you can study what is going on) then try lowering the process priority of the offending process. This will permit you to use your computer again so that you can solve the problem.

Have fun,
Dimitris Staikos

August 04, 2008

Sloppy programming

Hell, it seems that the Apple incident inspired me after all and I got our SBP2 bug down :-)

That was a tough nut to crack. The SBP2 driver was failing with a sample 4TB external hard drive, actually giving me a BSOD. However this was not a nice BSOD like most others. The kernel crashed because it detected a corrupt doubly linked list, and of course I had no idea where the corruption came from.

After cleaning up a lot of stuff I found the culprit... just by looking at the debug messages. We had a failed kernel mode sanity check that led to a buffer overrun. However the buffer was at the end of a struct thus overwriting the next struct and leading to crazy behaviour when the OTHER struct was getting used, much later on.

Anyway, to cut a long story short the buffer should be 16 bytes instead of 12 bytes. I changed that and looked carefully through the code, then stepped through the initialization code and all looked nice. Sure enough there was no crash any more, however nothing else worked either :-D

So I embarked on a journey to find what the heck could be causing this and here is my finding.

Suppose you have a struct like this:

struct ORB_DATA
{
// Some members here...

// At the VERY END of the ORB_DATA struct.
struct ORB_CMD orb;
};


NEVER, EVER ASSUME that:

(sizeof(ORB_DATA) - sizeof(ORB_CMD)) == FIELD_OFFSET(ORB_DATA, orb)

The compiler may align things as it sees fit, so just use FIELD_OFFSET and leave the neat tricks alone.

Have fun!
Dimitris Staikos

August 01, 2008

Solving an NMI crash

As I described in one of my previous posts, there is a special kind of fatal error called NMI error, a form of ultra fatal error that Windows users don't encounter as often as the infamous Blue Screen of Death (BSOD).

It is caused by some erroneous hardware behavior which, in order to make things sound 'common sense', let's say crashes the motherboard, while BSODs crash the operating system.

Of course, it is the software (operating system and device drivers) that programs the hardware, so if the hardware is not at fault (true hardware damage) then it is the software that is to blame once again :-).
Even if the chips have bugs (hardware bugs) the device driver developer is required to write code that circumvents the hardware bug so that the device operates correctly.

I have been working with Windows device driver code for a total of about 6 years in my career and of course I have encountered and solved dozens of BSODs in our drivers. But so far I had met only one NMI error caused by our driver. A couple of days ago I encountered my second NMI error and I was truly excited!!!

Why I was excited instead of disappointed and thinking that our code is being crappy? For starters, I KNOW our code is not crappy (famous last words) so I couldn't possibly worry about that.
There are two reasons why I was excited:
(1) One NMI error in 6 years means that you don't get the chance to debug such beasts very often. The more often you get to debug such errors the more experienced you become in the sorts of things that cause NMIs and how to debug them. Such a kind of experience is extremely hard to get and extremely valuable.
(2) Secondly, as I explained in my previous NMI post, the software was running smoothly everywhere but was causing an NMI on some semi-exotic platform. This was exactly the case once again, an exotic Xeon-based super server. If we have clients that decide to run our software even on such exotic machines then (a) we must be having a lot of clients (since only a small percentage use exotic machines overall) and (b) they trust our software enough to use it in extremely demanding applications that require exotic machines.

In both NMIs the actual case was that the client let's say started using our software on Series 3 of the server, then upgraded to Series 4, then Series 5 and when they tried to upgrade to Series 6 (in their labs of course) they found out that an NMI was occuring.
So there is no matter of trust to our code and our company. Our clients KNOW that our software works correctly and they obviously realize that we don't have the latest and greatest version of every exotic server available in our lab to test with, so they tell us about the error, we get it fixed and both parties are totally happy and excited!

So if there is a common sense lesson here for ordinary users it is this one: If you decide to build your own super exotic PC with a uniquely super cool combination of the latest-greatest-fanciest hardware don't act that much surprised if you have driver problems (BSODs or more likey NMIs). If you want to stay out of trouble then shoot for something that is less exotic. Or at least ask for your PC provider to build your PC and then test it a bit before you pay for it and take it home.

OK then, just for the record, let me tell you what was causing the NMI.

Initially I thought that it was a cache-coherency issue that made the 1394 adapter read garbage for the context program of its DMA context. Cache-coherency means that the code updates some memory, but the new memory contents are still inside the CPU cache because that portion of the cache was not flushed to main memory.
However devices read their context programs from main memory, so if the data did not reach there yet the device will read garbage and act accordingly. Too bad that devices can't popup error message boxes to the user :-D
By examining the code carefully for that kind of error I located a little well-hidden window of opportunity where it could occur. So I fixed this bug, but lucky me, it was not the cause of the NMI. The NMI persisted.

Then I fired up my 1394 Bus Analyzer and after several experiments and server crashes one thing was evident. The 1394 adapter would always transmit exactly 51 packets before NMIing. Now, *always* and *exactly 51* are a strong indication that something a little special must be happening on the 52nd packet.
I knew of course that each packet uses 5 DMA descriptors in the DMA context program, each descriptor being 16 bytes, so I did some simple math: 5*16*51=4080. BINGO!!!
Each physical page of memory has 4096 bytes, so the 5 descriptors of the 52nd packet were on a physical page boundary. That's always a good start, although the DMA context program is being written in 'physically contiguous' pages so crossing the boundary shouldn't result in any surprises.

The NMI was caused by what we call 'isochronous transmit'. But there is also 'Isochronous receive' and it also uses 5 descriptors of 16 bytes each, but DID NOT crash on the 52nd packet, on the same machine of course.
How could that possibly be?

I studied the chip specs closely for any mention of anything related to physical page boundaries for the DMA context programs and sure enough... there was nothing. No restrictions whatsoever were mentioned.

Mind you, the code that is preparing the isochronous transmit context program was originally written back in 1999 so it has been literally tested on thousands of computers since then. This was sure a neat thing that was going down here. (Note: 1999 is more than 6 years ago, but I didn't work with drivers all the time ;-))

Then I studied our code again and soon I found out that the isochronous transmit was using 5 descriptors but the first one was a bit special, in fact something like a "double" descriptor. Since 4096-4080=16, then it became evident that this "double" descriptor was getting split in two physical pages (always physically continuous in memory).

Hmmm, I thought, maybe this machine is not too happy with this fact and crashes with the NMI at the moment the 1394 DMA chip is trying to read the 32-bytes of the next descriptor in one operation from two different physical pages.
This sounded plausible enough in my mind, so I started to give it a try.

Of course I was not sure that it was the correct reason, I mean it works everywhere else right?

I would have to change the code first then run it and see if the NMI goes away. But I estimated the correction to be at least a week's worth of coding, because too many things had to change in order to accomodate for the required 'holes' in the context program.
Classic case of a Catch-22. I can't put in 4-5 days of work just to test an idea! What if it was not the correct one?

Then something else came to mind... a quick and dirty solution... If I add 3 nop descriptors (nop="no op"="no operation") to each packet then I will have 8*16 bytes = 128 bytes. And sure enough 4096 is divisible by 128, so I would never have the special descriptor on a page boundary.
Of course this means 48 wasted bytes of physical memory for each packet, which is a very precious resource. The requests we deal with may contain thousands of packets, so that would be a non-trivial waste.

I didn't have to think twice before I decided that this was not an acceptable solution, but it was just fine in order to try out my theory!

And then came that precious moment of glory!!!
It worked like a charm :-)
My theory was right and bye-bye NMI #2.

Then I embarked on a 4-day effort to implement the proper solution that doesn't waste memory and works too.

Isn't it just so cool being a driver developer in your spare time? :-D

Have fun!
Dimitris Staikos

July 26, 2008

Error Handling Crimes

Hell, I am furious! I come to work on a bloody Saturday, to review some code so that we can release our new product on time and I stumble twice on one of the most basic coding crimes when it comes to error handling, each done by a different developer.

I know it's natural to make errors, that's why I do the code review in the first place right? Absolutely! The reason that I am furious is that one of the two instances of this crime is long ago commited to our product (an SDK) and thus I cannot correct it... because... I will break backwards compatibility. Damn it!

The other one is not released yet, so it's just a human error; we'll just have to update about 70 calls to some internal function and we are done.

So what exactly is this crime I am talking about?

Simply put, it is a CAPITAL OFFENSE to convert an error code to a boolean Success_or_Failure flag.
A variation of this crime is to convert a rich error code into something like E_FAIL (in the COM world).

In real life, most programs cannot do much in the face of errors, other than (a) not crash (b) let the user know what went wrong.

In real life, when something goes wrong the user will immediately contact technical support. So imagine being the tech personnel, having a frustrated and often angry user telling you "Hey, I tried to do such and such and it fails with the error E_FAIL (Unspecified Error). What the hell is going on?".

Exactly my point! What the hell is going on???? How is our tech support team supposed to provide decent technical support if we don't know exactly what error occurred?

Most developers I have encountered in my career so far don't have the proper understanding and respect for error handling code. They want to write the REAL code not the error handling crap (or heavens forbid... the documentation).
They don't understand or just don't want to realize that in solid systems the error handling code can be up to 50% of the code. Yeap, that's FIFTY percent. And if you don't write it properly then you are NOT doing 50% of your job. Think a little bit about it.

Moreover, in most cases you don't write the error handling code for your end user, you write it FOR YOUR OWN SHAKE, to make YOUR LIFE easier when something goes wrong to the user. Because as sure as death and taxes are, things going wrong to the user is a certainty.

OK, now that I've had my fair share of whining I am not furious any more... Back to business :-)

Have fun!

November 26, 2007

Demystifying Device Driver Development (DDDD)

It just occurred to me recently that most people don't really have a clue about what it is that Device Driver Developers (DDDs) do for a living. Since I am a DDD myself, among other things, I thought I should make an attempt of describing what I do in a way that Common People can hopefully digest.

Since I enjoy making wannabe hilarious Tom & Jerry dialogs, I will try to put it in a dialog between Jerry the Technical Interviewer and myself. I will intentionally wear the hat of the anti-social, psycho-path, under-the-hood programmer and let Jerry extract all the information from me in a criminal-interrogation like style.

Here we go...

Jerry: So you are a DDD.
Me: Yeap.
Jerry: Can you tell me what it is that you do?
Me: Yeap.
Jerry: I mean you build device drivers right?
Me: Yeap.
Jerry: What exactly is a device driver?
Me: It's just a piece of software that controls a device. Don't you know that? Are you wasting my time?
Jerry: Yes, I guess I do, ehhh I mean I do know, but what exactly is a device?
Me: That's an interesting question so I will dignify it with an answer. A device is a clever piece of hardware. To make things simple, imagine that a device has a little CPU of its own, a CPU that can execute various tasks depending on the hardware at hand.
Jerry: Wow64, there is a CPU on these things? A CPU on a PCI card that I buy off the shelf?
Me: Duh, but of course, how else do you think they would do any useful work?
Jerry: I thought the computer's CPU was the only CPU in the block.
Me: No, devices have CPUs and depending on the hardware they can be pretty sophisticated.
Jerry: That's awesome! What can you tell me about these little CPUs?
Me: For starters, they are not little. Most of them can do several tasks at the same time and they can be pretty darn fast.
Jerry: Hmmm, wherever there is a CPU, there is a program right?
Me: Yeap.
Jerry: Well, where are the programs that these CPUs execute? Firmware stored in flash memory may be?
Me: You'd wish.
Jerry: Please elaborate, I'm getting really curious here.
Me: OK, if you insist... These programs are generated by the device driver.
Jerry: Pardon me?
Me: The programs that the device CPUs execute are generated by the device drivers. Clearer now?
Jerry: Wow64!
Me: Indeed.
Jerry: Generated by the device driver... In what language?
Me: That was a good one! Machine language of course.
Jerry: You mean mov ecx,edx and the like?
Me: Well, in a sense yes. Each device CPU has its own machine language.
Jerry: You are kidding me right?
Me: No way. Don't be an idiot.
Jerry: So the device driver builds machine language programs for the device CPU?
Me: Actually, it's even worse than that. These CPUs usually have many sub-CPUs, execution contexts we call them. The device driver generates programs for several (if not all) execution contexts. Each program is called a context program.
Jerry: Wow64 once again! And where are these context programs located at runtime?
Me: Well, in most cases in main memory, the little thingy you call RAM. Sometimes devices may have onboard memory for storing these programs, but in most cases it's just RAM.
Jerry: In RAM? Wait wait... And how can the device CPU read from RAM? Can it handle virtual addresses and the like?
Me: Man, you are really something! Virtual addresses? What are you talking about? When was the last time you heard the term "Physical Address"? These are the kind of things these CPUs can swallow.
Jerry: Physical Address? What are YOU talking about? Are we talking about DOS here?
Me: LOL. Please realize NOW that devices are what's called "Close To The Metal". They don't and they won't understand virtual crap of any sort. They operate at the PCI level, so they will only understand PCI-level physical addresses.
Jerry: Then how does it all work?
Me: Well the device driver allocates some piece(s) of memory where it intends to store the context programs and then kindly asks the operating system to give it the physical addresses of these pieces of memory. The OS happily obliges and then the device driver prepares the context programs, using physical addresses wherever a jump is required. When the context programs are ready, the device driver sets the Program Counter register on the various sub-CPUs of the device and then starts these CPUs. Ain't it cute?
Jerry: Wowowowowow64!!! That's super cool.
Me: Yeap.
Jerry: If I understand well, the device driver is like a mini operating system for a device, right?
Me: You couldn't phrase it better.
Jerry: Now I realize why being a DDD is so tough! You have to write a mini OS of your own and make the device do your bidding!
Me: Yeah, but the truth is making the context programs and handling their operation is the easy part.
Jerry: Huh? And what is the difficult part?
Me: ERROR HANDLING.
Jerry: Please please, please elaborate!
Me: Well, it's no big deal for an experienced DDD to write a device driver for a well behaved device. The tough thing is to write a driver that does not malfunction or crash the OS in the presence of device errors. These do not happen often so it is really hard to test the error handling code. Actually if you write ANY decent program, user mode or kernel, error handling largely amounts to 50% of the code. If it does not amount to 50% in your code, then you haven't done enough error handling. No developer likes error handling (with the exception of myself of course). It's boring, it's nasty, it's creepy. But unless you do it and you TEST it throughly you can't write decent software. For device drivers this is even more crucial, since in the presence of device errors the best thing that you can hope for is that the device stops functioning (we call it the device is dead). However in the majority of cases (at least in Windows) you get an OS crash.
Jerry: And how do YOU test your error handling code?
Me: That's a trade secret.
Jerry: Com'on now...
Me: I keep an arsenal of malfunctioning devices at my office. Everyone in my team has strict instructions to immediately hand me any device that appears to be malfunctioning. They are my secret treasure.
Jerry: But how can you be sure that your driver handles correctly all possible errors?
Me: I am not.
Jerry: You're NOT???
Me: Well, why do you think it is that people still get BSODs when something unusual happens? Even if you get a driver developed by a team that takes error handling very seriously, they can't test everything, simply because they haven't seen everything yet. So you get BSODs.
Jerry: I am shocked. This is brutal.
Me: Well, as I've said elsewhere BSODs are a blessing. As far as your complaints are concerned, please pass them on to the device manufacturers. If they would put some "test mode" into their devices and make them fail randomly then MY job would be MUCH more easier. But these things would have to go to the chip level and then they would have a significant added cost as you realize. So practically everyone in the industry operates in a "hoping for the best" mode.
Jerry: ...
Me: You can't imagine what I have to go through to make buggy devices operate correctly. In your mind the term "bug" is most probably related to software, but let me break some news to you: There are COUNTLESS bugs in hardware chips and devices, and poor DDDs like me have to struggle hard to ship a decent driver only to have lusers scream their lungs out at ME when they get a BSOD. They scream because probably I didn't handle 100% correctly all the bugs in the hardware chips. But who am I supposed to scream at? Texas Instruments? Intel? You get the picture.
Jerry: Well, I can only say that under this new light I have a whole new appreciation for the work you DDDs are doing.
Me: Darn right you should.
Jerry: My special thanks for this enlightening interview.
Me: May the force be with you.
Jerry: Any last minute advice for Common People who use PCs?
Me: Don't add more RAM to your PC unless you know what you are doing.
Jerry: Excuse me? Don't add more RAM? What harm could additional RAM possibly cause???
Me: I will describe that in a separate post, when I feel like having mercy on the masses. For the time being I know, and you don't. Huh!

Have fun!
Dimitris Staikos

October 27, 2007

1394 DMA Multiplexing sneak preview

Today I got a new, highly demanding piece of code working in our drivers. What it does is to receive streaming video from multiple 1394 cameras using a single DMA context on the 1394 adapter. Up till now each camera would have its dedicated DMA context. Since there are only four of those on most 1394 adapters, you could only receive 4 camera video streams per adapter. If you needed more you had to install a second 1394 adapter, etc.

DMA Multiplexing was a feature that we have been talking about for years in Unibrain. It would always get pushed back in the schedule since it was considered a "complex" item. Finally we did it. Of course there is more dirty work to get done, but the tricky part is over, and the important result is that the multiplexing DMA context can really handle some serious load without flinching.

I know some people like sneak technology previews so here is a photo of one of my test PCs displaying 6 cameras using a single DMA context:

Img_5789_2

If you zoom on the actual 5MP picture (1.5MB download), low on the right side you can see the 1394 bus topology, consisting of two PCs (nodes 0 and 9), six cameras, 1 repeater (node 1) and 1 analyzer (node 3). The monitor is on the blue node (node 0).
CPU usage (37%) is pretty decent for a slightly dated single CPU system.

Of course there is no way you can verify, just by looking at the photo, that all six video streams are on a single DMA context. You'll have to trust me on that :-)

Have fun!

October 19, 2007

Undefined Behavior (NMI quiz - Part 2)

It has been some time since my last Tom & Jerry dialog.

Jerry: Hey Tom, could you help me out here? I am kind of confused...
Tom: Sure thing Jerry! I'd be glad.
Jerry: Do you happen to know what "Undefined Behavior" is?
Tom: Well, isn't it when the results of an operation are unpredictable?
Jerry: I know, but isn't that a definition for this kind of behavior?
Tom: Uhmmm, I guess so...
Jerry: So, you have just defined "undefined behavior".
Tom: Uhmmm, I guess so...
Jerry: So, what's undefined about it, since you just defined it?
Tom: I hate it when you do that! Leave me alone!

Undefined Behavior is a funny term, a misnomer if I am allowed to say.
If you ask me, there is NO such thing as "Undefined Behavior"
Even random behavior is not undefined. It is defined as random.

Instead, there is behavior undefined BY someone.

If you read about Undefined Behavior on WikiPedia you will notice the following in the first paragraph:

... the specification leaves the results of certain operations specifically undefined.

It is the specification that does not define the result of the operations, so implementations are free to do what they feel like, plus tell no one about it (if you are required to document it then it is implementation-defined).

What does all this have to do with NMIs? Well, by now it should be evident.

The NMI I got bitten by last month was caused by "Undefined Behavior", as described someplace inside the OHCI 1.1 specification.

In my case, up to that point in time, the undefined behavior our code was stepping on was not undefined at all. It was perfectly stable and correct behavior. But one day, given the proper hardware combination, it decided to flip and do an NMI instead. Cool, but still perfectly defined :-)

So to answer the quiz, yes, it was once again the software's fault :-)

Have fun!