« October 2007 | Main | December 2007 »

November 26, 2007

Demystifying Device Driver Development (DDDD)

It just occurred to me recently that most people don't really have a clue about what it is that Device Driver Developers (DDDs) do for a living. Since I am a DDD myself, among other things, I thought I should make an attempt of describing what I do in a way that Common People can hopefully digest.

Since I enjoy making wannabe hilarious Tom & Jerry dialogs, I will try to put it in a dialog between Jerry the Technical Interviewer and myself. I will intentionally wear the hat of the anti-social, psycho-path, under-the-hood programmer and let Jerry extract all the information from me in a criminal-interrogation like style.

Here we go...

Jerry: So you are a DDD.
Me: Yeap.
Jerry: Can you tell me what it is that you do?
Me: Yeap.
Jerry: I mean you build device drivers right?
Me: Yeap.
Jerry: What exactly is a device driver?
Me: It's just a piece of software that controls a device. Don't you know that? Are you wasting my time?
Jerry: Yes, I guess I do, ehhh I mean I do know, but what exactly is a device?
Me: That's an interesting question so I will dignify it with an answer. A device is a clever piece of hardware. To make things simple, imagine that a device has a little CPU of its own, a CPU that can execute various tasks depending on the hardware at hand.
Jerry: Wow64, there is a CPU on these things? A CPU on a PCI card that I buy off the shelf?
Me: Duh, but of course, how else do you think they would do any useful work?
Jerry: I thought the computer's CPU was the only CPU in the block.
Me: No, devices have CPUs and depending on the hardware they can be pretty sophisticated.
Jerry: That's awesome! What can you tell me about these little CPUs?
Me: For starters, they are not little. Most of them can do several tasks at the same time and they can be pretty darn fast.
Jerry: Hmmm, wherever there is a CPU, there is a program right?
Me: Yeap.
Jerry: Well, where are the programs that these CPUs execute? Firmware stored in flash memory may be?
Me: You'd wish.
Jerry: Please elaborate, I'm getting really curious here.
Me: OK, if you insist... These programs are generated by the device driver.
Jerry: Pardon me?
Me: The programs that the device CPUs execute are generated by the device drivers. Clearer now?
Jerry: Wow64!
Me: Indeed.
Jerry: Generated by the device driver... In what language?
Me: That was a good one! Machine language of course.
Jerry: You mean mov ecx,edx and the like?
Me: Well, in a sense yes. Each device CPU has its own machine language.
Jerry: You are kidding me right?
Me: No way. Don't be an idiot.
Jerry: So the device driver builds machine language programs for the device CPU?
Me: Actually, it's even worse than that. These CPUs usually have many sub-CPUs, execution contexts we call them. The device driver generates programs for several (if not all) execution contexts. Each program is called a context program.
Jerry: Wow64 once again! And where are these context programs located at runtime?
Me: Well, in most cases in main memory, the little thingy you call RAM. Sometimes devices may have onboard memory for storing these programs, but in most cases it's just RAM.
Jerry: In RAM? Wait wait... And how can the device CPU read from RAM? Can it handle virtual addresses and the like?
Me: Man, you are really something! Virtual addresses? What are you talking about? When was the last time you heard the term "Physical Address"? These are the kind of things these CPUs can swallow.
Jerry: Physical Address? What are YOU talking about? Are we talking about DOS here?
Me: LOL. Please realize NOW that devices are what's called "Close To The Metal". They don't and they won't understand virtual crap of any sort. They operate at the PCI level, so they will only understand PCI-level physical addresses.
Jerry: Then how does it all work?
Me: Well the device driver allocates some piece(s) of memory where it intends to store the context programs and then kindly asks the operating system to give it the physical addresses of these pieces of memory. The OS happily obliges and then the device driver prepares the context programs, using physical addresses wherever a jump is required. When the context programs are ready, the device driver sets the Program Counter register on the various sub-CPUs of the device and then starts these CPUs. Ain't it cute?
Jerry: Wowowowowow64!!! That's super cool.
Me: Yeap.
Jerry: If I understand well, the device driver is like a mini operating system for a device, right?
Me: You couldn't phrase it better.
Jerry: Now I realize why being a DDD is so tough! You have to write a mini OS of your own and make the device do your bidding!
Me: Yeah, but the truth is making the context programs and handling their operation is the easy part.
Jerry: Huh? And what is the difficult part?
Me: ERROR HANDLING.
Jerry: Please please, please elaborate!
Me: Well, it's no big deal for an experienced DDD to write a device driver for a well behaved device. The tough thing is to write a driver that does not malfunction or crash the OS in the presence of device errors. These do not happen often so it is really hard to test the error handling code. Actually if you write ANY decent program, user mode or kernel, error handling largely amounts to 50% of the code. If it does not amount to 50% in your code, then you haven't done enough error handling. No developer likes error handling (with the exception of myself of course). It's boring, it's nasty, it's creepy. But unless you do it and you TEST it throughly you can't write decent software. For device drivers this is even more crucial, since in the presence of device errors the best thing that you can hope for is that the device stops functioning (we call it the device is dead). However in the majority of cases (at least in Windows) you get an OS crash.
Jerry: And how do YOU test your error handling code?
Me: That's a trade secret.
Jerry: Com'on now...
Me: I keep an arsenal of malfunctioning devices at my office. Everyone in my team has strict instructions to immediately hand me any device that appears to be malfunctioning. They are my secret treasure.
Jerry: But how can you be sure that your driver handles correctly all possible errors?
Me: I am not.
Jerry: You're NOT???
Me: Well, why do you think it is that people still get BSODs when something unusual happens? Even if you get a driver developed by a team that takes error handling very seriously, they can't test everything, simply because they haven't seen everything yet. So you get BSODs.
Jerry: I am shocked. This is brutal.
Me: Well, as I've said elsewhere BSODs are a blessing. As far as your complaints are concerned, please pass them on to the device manufacturers. If they would put some "test mode" into their devices and make them fail randomly then MY job would be MUCH more easier. But these things would have to go to the chip level and then they would have a significant added cost as you realize. So practically everyone in the industry operates in a "hoping for the best" mode.
Jerry: ...
Me: You can't imagine what I have to go through to make buggy devices operate correctly. In your mind the term "bug" is most probably related to software, but let me break some news to you: There are COUNTLESS bugs in hardware chips and devices, and poor DDDs like me have to struggle hard to ship a decent driver only to have lusers scream their lungs out at ME when they get a BSOD. They scream because probably I didn't handle 100% correctly all the bugs in the hardware chips. But who am I supposed to scream at? Texas Instruments? Intel? You get the picture.
Jerry: Well, I can only say that under this new light I have a whole new appreciation for the work you DDDs are doing.
Me: Darn right you should.
Jerry: My special thanks for this enlightening interview.
Me: May the force be with you.
Jerry: Any last minute advice for Common People who use PCs?
Me: Don't add more RAM to your PC unless you know what you are doing.
Jerry: Excuse me? Don't add more RAM? What harm could additional RAM possibly cause???
Me: I will describe that in a separate post, when I feel like having mercy on the masses. For the time being I know, and you don't. Huh!

Have fun!
Dimitris Staikos

November 23, 2007

The new generation

I am in a fancy mood tonight, so I thought I could share some personal stuff.

Based on the photo below I have some first indication that my son will get involved in the same profession that both his dad and mom are in. He's not yet even 4 and a half years old and he can use our tablet laptop with remarkable ease.

Lambrosatwork

Have fun!
Dimitris Staikos

November 22, 2007

What are you talking about?

Well, sometimes software engineers can get really inventive. And when that happens usability goes down the drain.

Here is a one-of-a-kind dialog box, a monument of inventiveness:

Questiononcaptionbar_4

A question on the dialog caption? On the dialog caption for heaven's shake???
People expect to see the program name on the caption, or something equivalent, and thus almost never look at what the caption says. So the average user is confronted with an answer but he can't see the question.

Needless to say this dialog box also demonstrates additional usability bugs:

  • First of all it traps the user. There is no Cancel button for the poor user that is not certain about what to do.
  • Second, YES/NO dialogs are the lazy programmer's way out when he needs feedback from the user. The dialog should really have a Disable and a Cancel button. However this means that the programmer must create a new dialog resource, a dialog class, etc, so most opt out and select a YES/NO question which is readily available through the Win32 MessageBox function.

    Still, if you insist on using MessageBox, you can put your brains to work and structure the question in a way that it has a YES/NO answer.

    "If you select YES such and such will happen ..." is plain silly.

    Almost any question can be formed as follows:

    The application can do blah blah...
    Do you want to do that?
    YES/NO/CANCEL

Have fun!
Dimitris Staikos

Hardware stack

I am a device driver developer and Unibrain, the company I work for, produces what's called a software stack. In plain text this means several pieces of software that are layered, one on top of the other. Each piece uses the services of the layer below it and provides services to the layer above it.

In Windows kernel programming the concept of "device stack" is prevalent. One device object attached to another device object, attached to another device object, etc.

This is a pretty common concept in software engineering.
Mind you, software stacks in the above sense have nothing to do with the "stack" data structure.

Although I am not a hardware engineer, as far as I can tell a concept similar to a software stack does not exist for hardware. Hardware components are definitely combined together, but the combination usually is on a side-by-side basis, not one layer on top of another on top of another etc.

If you search the Internet with "Hardware Stack" you will mostly find items describing hardware support for the stack as a data structure.

Anyway, one day, being in a funny mood, I decided to implement my own a hardware stack or device stack if you prefer. Here's what it looks like!

Camerastack

Btw this is a really expensive stack (five digits in USD).

Have fun!
Dimitris Staikos

Jochen Kalmbach RULES!

Jochen Kalmbach RULES!

I don't need to say much more :-)
Dimitris Staikos

November 20, 2007

Tower of Babel

This post marks the beginning of a new category of posts: Funny Software.

I work in the software business, and I often get into weird situations when trying to do my work or simply have fun with the computer. These situations are also marked as WTF by many, and there are sites that list lots of WTFs. By the way, WTF stands for "What The F..k ?"

So here my first WTF:

Tower_of_babel

Every time I clicked OK, a new line was added :-)

Have fun!

November 17, 2007

Taipei 101

Finally, on my 7th day being in Taipei I found the time for a visit to the infamous Taipei 101 currently the highest building in the world :-)

So I will add it to my list of "Been there, Done that" :-D

As you might know all basic university classes are marked as 101, Computer Programming 101, Anatomy 101, Micro economics 101, you name it.
In that sense the tower is indeed Taipei 101.

I was really amazed at how the elevetor got us 500+ meters up in 36 seconds, without the slightest sense of acceleration. They even adjust the air pressure as you go up, so that you are not disturbed. On the way down of course I felt the pressure change in my ears.

Moreover it has an open roof that you can visit, and it's really amazing.

Have Fun!

November 09, 2007

PrintScreen magic

Do you want to play a really nasty joke on your good friend?

Well, here's the deal: When a Windows computer has booted in DEBUG mode (/DEBUG and friends in boot.ini entry) then pressing PrintScreen causes a kernel breakpoint, bringing the machine to a complete halt, so that the kernel debugger can operate on it. Now, isn't that cool?

If you are not at ease touching BOOT.INI or using the dreaded BCDEDIT in Vista, then use MSCONFIG.EXE at your leisure.

So set your friends machine to boot in debug mode and rest back, waiting for him to try to get a screenshot. Alt+PrintScreen won't trigger the breakpoint. It happened to me twice, before I realized what on earth was happening.

Here is the message in the Kernel Debugger that welcomes you when you connect to a debug PC that has been frozen with PrintScreen:

*******************************************************************************
*
*   You are seeing this message because you pressed the SysRq/PrintScreen
*   key on your test machine's keyboard.
*
*                   THIS IS NOT A BUG OR A SYSTEM CRASH
*
* If you did not intend to break into the debugger, press the "g" key, then
* press the "Enter" key now.  This message might immediately reappear.  If it
* does, press "g" and "Enter" again.
*
*******************************************************************************

The funny thing is that even simple user mode asserts and unhandled exceptions will break into the kernel debugger. So some application crashes, Windows freezes, and your friend is wondering whether he should move to Linux ;-)

Have fun,
Dimitris Staikos

November 04, 2007

Minimum Minimum

Suppose that you have a thingy called something like "Minimum Amount of Time that must elapse after the occurrence of event X before Event Y is allowed to occur".

Now you want to turn this thingy into a setting in your program, so that your users can try out different values and find the ones that suit them best. However, for practical reasons, you will only allow a range of values.

This means that there is a minimum value for this setting and a maximum value.
What do you call the minimum value of this setting?
Naturally it's "The Minimum Minimum amount of time that ...".

Equally naturally its maximum value is the "Maximum Minimum ...".

And if you have a whole list of different events, Y, Z, ..., that may occur after X and each has a configurable minimum elapsed time, what do you call the minimum amount of time after X occurs that any event is allowed to happen?
This is the minimum of the various "Minimum-Minimums", so naturally it's "The minimum minimum minimum amount of time that ...".

And if you have lots of different X events, X1, X2, ...Xn, that define limitations on the elapsed times before other events must occur, what do you call the minimum amount of time after any Xi event occurs before any event is allowed to happen???

By now, if your stack hasn't overflowed, you should be able to realize that it is "The Minimum Minimum Minimum Minimum ...".

Pure common sense, right?

Have Fun!