Windows portability pain

11 May 2011
Although I use Linux most of the time for software development, portability to Windows is a requirement. Most of the time this is not a problem as I am either using a tool-kit (usually wxWindows), or some wrapper functions I wrote myself that wallpaper over the slightly different parameter conventions between Unix and Windows. However there are a few occasions when I have to choose between a grossly over-complex but cross-platform design, and simply writing two separate implementations. This week was one of those occasions.

Being a network programmer I spend most of my time writing servers and client ADTs, and my usual mode of operation is to write single-threaded programs that are based around select() calls. Under Linux this function can wait for just about anything, including GUI events if you can get hold of the underlying X11 client handle. Under Windows it can only listen for activity on network sockets. I wanted to write a simple program that sniffed a serial port and sent anything it received to another process for further processing, but I also wanted a control channel back to this sniffer.

The portable approach

Threads. Handle the serial and control socket IO in different threads. Ports well because synchronous IO where each call only deals with one file/socket/port is either source-compatible or trivial to rewrite. Posix threads are standard under Unix, and implementations are available for Windows.

The problem is using threads, particularly getting them to communicate. This is best illustrated by considering what is required to get the whole program to shut down gracefully in response to a quit message. If the IO reads/writes are blocking, then the blocked thread(s) will not see the shut-down notification until it has received some input, which can result in a hung program as it may never actually receive this input. Conversely if non-blocking IO is used (assuming it is available..) you will have threads in tight loops that spend all their time locking and unlocking mutex locks, which in practice tends to live-lock the system as context-switches rarely happen while the mutex is unlocked.

By the time you've sprinkled in a load of empirically-chosen delays and thread yields/sleeps you've got yourself a pile of code that is a headache to maintain and an utter nightmare to debug when it goes wrong. Biggest problem is that the execution trace is affected by the scheduling policy of the operating system, and that is basically non-deterministic.

Windows-native approach

Under Windows, the general-purpose equivalent of select() is WaitForMultipleObjects(), which takes a list of event objects. Windows calls this Overlapped IO, of which a bare-bones demonstration is:

event[0] = CreateEvent(NULL,TRUE,FALSE,NULL); overLapIO.hEvent = event[0]; ReadFile(hdlSocket,ptrBuffer,maxBuffer,&lenBuffer,&overLapIO); /* ... */ valReturn = WaitForMultipleObjects(event,cntEvent,FALSE,INFINITE); switch( retVal ) { /* ... */ }

One irritation is the subtle different in treatment between requests that are serviced immediately and those that are not. If ReadFile() is able to complete immediately, it is an error to then do a WaitForSingleObject() or WaitForMultipleObjects() wait on the event object. Under Unix, you simply do the select call blindly, then sort out the sockets/handles that need servicing. The latter is less of a headache when dealing with lots of things that will 99% of the time not be ready immediately.