2014
02.26

Consider the following code, which might be part of a custom Font class to draw fancy text.

procedure DrawText(Text: String; Position: TRect; Color: dword);

Looks good, right? Parameters are clear, order makes sense, and it’s easy to understand what the code does when you find it invoked:

// some random stuff
Font.DrawText('Hello!',Window.Position,COL_WHITE);
// more random stuff

No need to go searching for the function implementation, or hovering over using the IDE to figure out what the parameters mean. It’s very clear. But now, say you want to add the ability to draw a “drop shadow” behind the text. No worries, let’s add a boolean parameter:

// Method declaration
procedure DrawText(
   Text: String; Position: TRect; Color: dword; Shadow: boolean);

// Invocation
Font.DrawText('Hello!',Window.Position,COL_WHITE,true);

Hmm. It’s not terrible, and if you use DrawText a lot you will soon pick up the meaning, but a little bit of clarity has been lost. You might get smarter and make Shadow an optional parameter which defaults to false, to prevent breaking any existing code. But now… you want to add the ability for text to be horizontally and/or vertically centered within the position rect. Let’s see:

// Method declaration
procedure DrawText(
   Text: String; Position: TRect; Color: dword; Shadow,CentreX,CentreY: boolean);

// Invocation
Font.DrawText('Hello!',Window.Position,COL_WHITE,true,true,false);

OK, at this point the function’s invocation is no longer readable by itself. Worse, no matter how many times you use it, you’re bound to forget the order of those parameters occasionally. I should know, the above function was ripped directly out of my old custom bitmap-font based class. But when I needed to add wordwrapping and other alignment options, it needed to go.

In general, if the meaning of a boolean parameter isn’t made totally obvious by the function name, it might not be the most readable solution. One of the golden rules of programming: it’s harder to read code than to write, so any extra effort to make it easier to read later is time well spent.

Use Sets as parameters!

It’s a little bit more setup, but you (and anyone else reading your code) will thank you if you take the time to implement a set. (if your language doesn’t support sets, constant values achieve the same result, but are very slightly less readable, and also don’t protect you against invalid values being passed).

// Type declaration
TTextStyles      = (
   TS_DropShadow ,
   TS_CentreX    ,
   TS_CentreY    ,
   TS_WordWrap   );

TTextStyle       = set of TTextStyles;

// Method declaration
procedure DrawText(
   Text: String; Position: TRect; Color: dword; Style: TTextStyle);

// Invocation
Font.DrawText('Hello!',Window.Position,COL_WHITE,[TS_DropShadow,TS_CentreX]);

It’s instantly obvious what the invocation does; no need to look up parameters. A bonus feature is the set of available styles can easily be extended without breaking any existing code.

The main downside is that it’s harder to generate set values inline compared to boolean functions, so if the parameters to your particular method tend to change depending on conditional evaluation, it can be a bit of a pain. But in this example, the readability benefit is worth the extra effort during declaration.

2014
02.22

This post is kind of an open question. I’m hoping that I can get some comments, any comments, on it 🙂

Let’s set up a situation with a problem, and look at different solutions for addressing the problem. Which one do you think is best, and more importantly, why?

The Setup

There is a multithreaded application, which offloads work items to a worker thread, and uses TEventObject.SetEvent call to signal to the main thread “hey, all done”. Note: the worker thread doesn’t just complete this one task and then exit; it instead sleeps until another work item is generated.

The Problem

We want the main thread to wait for either a Windows message to be received, OR for the worker thread’s event to be signalled. Win32 provides a handy function for doing exactly this: MsgWaitForMultipleObjects with the event object’s handle and with QS_ALLINPUT as the wakemask.

However, FPC’s implementation of TEventObject doesn’t expose the OS handle, so we can’t call Win32 functions on it. (TEventObject inherits from THandleObject, whose reason for existance is encapsulating an operating system handle, however it’s actual “Handle” property is a pointer and the documentation explicitly states TEventHandle is an opaque type and should not be used in user code).

From looking at the actual implementation of TEventObject, you can determine that the Windows Handle needed is the very first member of the structure pointed to by the “Handle” property, however.

TEventObject.WaitFor() is clearly the intended usage, however that would block on that event only, and doesn’t wake up if window messages are received, therefore doesn’t address the situation.

(Note: it makes perfect sense why TEventObject is abstracting the Windows handle; otherwise it would be too easy to break cross-platform source compatibility, which is something FPC tries really hard to maintain. However, in this scenario the project is locked to Win32 for a number of reasons, so we should take advantage of it’s capability to wait for both events and messages without polling if we can.)

The Options

Here are the options, as I see it (I may have missed some!)

  1. Reinvent: Reimplement the whole TEventObject functionality, which will mostly be line-for-line identical except that the windows Handle will be an accessible property instead of hidden behind an opaque pointer type.
  2. Typecast: Apply the typecast of HANDLE(TEventObject.Handle^) to grab the windows handle out of the opaque type, disable the warning, and just hope the implementation behind the opaque type doesn’t change in a future version of FPC.
  3. Monitor: – Create a third thread, whose only purpose in life is to call TEventObject.WaitFor. Then back in the main thread, call MsgWaitForMultipleObjects on the third thread’s TThread.Handle (since it’s implementation is different: the TThread.Handle property IS a directly usable Windows handle without a typecast.
  4. Poll: Call TEventObject.WaitFor with a small timeout of say 10 millseconds, and pump window messages between waits.

Discussion Please!

In my view,:

  1. Reinvent is the safest, but also doesn’t feel great; turning our backs on FPC’s otherwise simple and elegant synchronization objects.
  2. Typecast is the easiest and results in exactly the behaviour that we want, but “it’s not guaranteed safe forever”.
  3. Monitor is safe, but burning an entire extra thread seems like overkill, and maybe worse than polling in #4.
  4. Poll works, but eats a lot of unnecessary context switches if the application is just idling and there’s no work going on. (A Sleep(10) in an idle loop consumes about 1% of an entire logical processor just in context switches)

Would love to hear your thoughts about how you would approach this. Remember the problem isn’t that the problem can’t be resolved (there are at least 4 solutions above, and maybe more I’ve missed), the problem is picking the ‘best’ solution, so your thoughts about why you would recommend a particular approach are the most valuable.

2014
02.16

So, using thread Suspend and Resume functionality is deprecated in Delphi, Freepascal, and even Windows MSDN itself warns against using it for synchronization.

There are good reasons for trying to kill this paradigm: suspending and resuming other threads from the one you’re currently on is a deadlock waiting to happen, and it’s typically not supported at all in OS’es other that Windows. The only circumstance where it’s needed is to start execution of a thread that was initially created suspended (to allow additional initialization to take place). This is still supported and a new command has been added to FPC/Delphi called “TThread.Start” which implements this.

However, a number of people are confused about how to correctly re-implement their “worker thread” without using suspend/resume; and some of the advice given out hasn’t been that great, either.

Let’s say you have a worker thread which normally remains suspended. When the main thread wants it to do something, it pushes some parameters somewhere and then resumes the thread. When the thread is complete, it suspends itself. (note: critical sections or other access protection on the “work to do” data needs to be here too, but is removed for clarity):

{the worker thread's execution}
procedure TMyThread.Execute;
begin
   repeat
      if (work_to_do) then
         //...do_some_work...
      else
         Suspend;
   until Terminated;
end;

{called from the main thread:}
procedure TMyThread.QueueWork(details);
begin
   //...add_work_details...
   if Suspended then
      Resume;
end;

Although the particular example above still works, now’s a good time to go ahead and clean this up so that you’re not depending on deprecated functions.

Here’s where we get to the inspiration for today’s post. The suggested ‘clean up’ is often implemented using polling. Let’s take something I saw suggested on stackoverflow as a replacement for the above:

procedure TMyThread.Execute;
begin
   repeat
      if (work_to_do) then
         //...do_some_work...
      else
         Sleep(10);
   until Terminated;
end;

procedure TMyThread.QueueWork(details);
begin
   //...add_work_details...
end;

Yuck! What’s the problem with this design? It’s not particularly “busy”, since it sleeps all the time, but there are issues. Firstly: if the thread is idle, it can be 10-milliseconds before it gets around to realizing there’s any work to do. Depending on your application that may or may not be a big deal, but it’s not exactly elegant.

Secondly (and this is the bigger one for me), this thread is going to eat 200 context switches per second (2 per 10ms), whether busy or not. A far worse design than the original! Context switches aren’t free! If we assume 50,000 nanoseconds per context switch (0.05ms), which seems a reasonable finding, 200 of them per second just ate 1% of the total capacity of a processor core, to achieve nothing except wait. There’s a better solution, right?

Use Event Objects

Fortunately, there are better ways than Sleeping and Polling. The best replacement for the above scenario is just to deploy an event. Events can be “signalled” or “nonsignalled”, and you can let the operating system know “hey, I’m waiting for this event”. It will then go away and not waste any more cycles on you until the event is signalled. Brilliant! How do you do this? Well, it depends on your language:

  • Win32 itself exposes event handles (see CreateEvent) which can be waited on with the WaitFor family of calls
  • Freepascal provides a TEventObject, which encapsulates the Win32 (or other OS equivalent) functionality
  • Delphi uses TEvent, which does the same thing
  • C# uses System.Threading.ManualResetEvent (and related)

Here’s how to rewrite the above handler using a waitevent, so it consumes no CPU cycles until an event arrives. (I’ll use the FPC mechanism, but they’re functionally identical in all the other languages).

constructor TMyThread.Create;
begin
   //...normal initialization stuff...
   mEvent := TEventObject.Create(nil,true,false,'');
end;

{the worker thread's execution}
procedure TMyThread.Execute;
begin
   repeat
      mEvent.WaitFor(INFINITE);
      mEvent.ResetEvent;
      if (work_to_do) then
         //...do_some_work...
 until Terminated;
end;

{called from the main thread:}
procedure TMyThread.QueueWork(details);
begin
 //...add_work_details...
 mEvent.SetEvent;
end;

Presto! Thread will happily wait until a new piece of work comes in, without consuming any CPU cycles at all, and it will respond immediately once there’s something to do. The only caveat on this design? The only way out of the .WaitFor() call is for the event to be signalled, so you also need to account for this when you want to terminate the thread for good. (note that FPC’s TThread.Terminate isn’t virtual, so we have to cast to TThread to get the correct call):

procedure TMyThread.Terminate;
begin
   // Base Terminate method (to set Terminated=true)
   TThread(self).Terminate;

   // Signal event to wake up the thread
   mEvent.SetEvent;
end;

Sorted!

2014
02.12

In Windows Vista and earlier, the operating system managed a list of audio (playback) devices, and one of them was always specified as the ‘default’. This is how audio devices may have looked on such a system:

  1. Speakers (Default)
  2. USB Headset
  3. Realtek rear audio jack

Applications using Windows’ built-in sound functions such as PlaySound() always played to the default device, while games and other more sophisticated software could enumerate the available audio devices and allow the user to choose, while still typically selecting whichever device is marked as ‘default’ if the user doesn’t express a preference.

Windows 7 changed the game a little bit, by introducing a new, virtual device called the “Default Audio Device”. If a user running Windows 7 on the same hardware as above opens up Control Panel and goes into Sound, they will still see the same three devices, so nothing appears to have changed. However, applications that enumerate the available devices now see something different:

  1. Default Audio Device (Default)
  2. Speakers
  3. USB Headset
  4. Realtek rear audio jack

In the above scenario, the speakers are still the default *physical* device (and the device that the user sees as Default in control panel), however the new, virtual device is the default device to applications. I fan application selects either device 1 or 2 in the list above, they will both play to the speakers. So, why implement this feature and change the relationship between what the application sees and the user sees?

The primary benefit is that the Default Audio Device automatically remaps to the default physical device whenever it changes, *instantly*. Whereas on previous versions of Windows if you did something like start playing a song in WinAmp with the Speakers as the default audio, then changed the default to the USB Headset, your music would continue to play through the speakers until the start of the next song (or whenever the application closed and reopened it’s audio channel).

Implementing Support in Applications

I strongly, strongly encourage you to ensure you support playback to the Default Audio Device in your application. Having a program or game that *doesn’t* change automatically these days is enough to make me stop using it (looking at Diablo III here!). If you’re using the awesome BASS Audio Library (which I continue to endorse wholeheartedly), here’s how you do it:

BASS_SetConfig(BASS_CONFIG_DEF_DEFAULT,true); // Use Default device
BASS_Init(-1,Frequency,0,0,nil); // Initialize audio

Now, if WinAmp is playing to the (virtual) Default Audio Device, and the physical default changes, the music changes to the new device *instantly*. WinAmp doesn’t have to poll or register a callback event or do anything. For users who switch between devices several times a day, support for this feature is a godsend!

Changing Default Device Programatically

The only problem? Microsoft haven’t provided any way to programmatically change the default audio device. They really really want you to do it manually, through the control panel each time. That’s really painful; the best solution I have found so far is using scripting to simulate opening the control panel and changing the defaults, similar to what’s described here. Not that elegant, but it gets the job done.

2014
02.07

This topic seems to come up quite a lot, and yet the majority of the time, there’s no clear answer. Take the scenario: you have a game or an application, and there’s some downtime. You’ve noticed in task manager that your apps remains at a high CPU Usage% even while it’s not doing anything, and you want to reduce it somehow. Which way is best?

This is where questions like “is Sleep(0) better than Sleep(1)? How about Sleep(50)? Or WaitForSingleObject, or WaitForMultipleObjects, or MsgWaitForMultipleObjects? Should I use the (deprecated) Multimedia Timers? Where does WaitFor() come in?

More often than not, answers to these questions describe how these functions *shouldn’t* be used, but a lot of the time they skip or gloss over an actual proposed alternative. The correct answer, of course, is that ‘it depends’: on the type of application you’re writing, the situation you’re in, and what you’re waiting *for*. With that in mind, however, let’s look at some concrete answers for common situations!

Games

For games, the answer is actually pretty simple. Your main thread almost never wants to sleep or wait for anything! There are two common scenarios: If you have a fullscreen, 3D or action game: never wait, just continually run the game loop and you get improved FPS. If you have a puzzle, card or strategy-type game, you may want to lock the framerate to the screen’s refresh rate. In that case, the best and only place for your application to wait is inside Direct3D’s Present() method, after calling Device.Reset() with PresentationInterval = D3DPRESENT_INTERVAL_DEFAULT. There it will happily render the scene while it waits for the appropriate time to elapse, and not waste any CPU cycles.

If you have a windowed game and it gets minimized, you again have two choices: either just keep the game running, or pause the game and wait for it to be restored. To keep running, just ignore the minimize event! If you want to wait efficiently until the player restores the window, the best way is something like this:

while (GameWindow.Minimized) do begin
   WaitMessage;         // Wait until new message arrives
   PumpWindowMessages;  // Process all pending WM_ messages 
end;

Assuming your WindowProc handles WM_ACTIVATE correctly to restore the game window’s state, this is all you need! Your application will receive exactly zero CPU time until a message arrives for it; it will immediately handle any messages that do come in (thus remaining responsive), and the instant the application is restored, normal play will resume.

Waiting for a worker Thread to finish

No contest here. This is exactly what the WaitFor() function was designed for. Use it! Note that “WaitFor” isn’t a Win32 API, but a version of this function is implemented by a wide variety of languages with threading support. FPC’s Windows implementation of TThread.WaitFor essentially does the following; if your chosen language doesn’t implement WaitFor, design something like this:

repeat
   Outcome := MsgWaitForMultipleObjects([Threadhandle,SynchronizeEvent],INFINITE,QS_SENDMESSAGE);

   case Outcome of
      WAIT_OBJECT_0  {ThreadHandle}    : ExitFlag := true; 
      WAIT_OBJECT_0+1{SynchronizeEvent}: CheckSynchronize; // Execute Synchronize() calls
      WAIT_OBJECT_0+2{a WindowMessage} : PeekMessage(PM_NOREMOVE);
   end;
until (ExitFlag);

In other words, it calls MsgWaitForMultipleObjects specifying a handle to the thread to wait for, a global handle that is signalled if a thread calls Synchronize() on a method, and specifies to exit if a message is received from another thread or process. If the thread ending was the outcome, the function exits. Otherwise, it either executes the synchronized method, or peeks at the window messages, and starts waiting again. (If there’s a GUI, the application may wish to alter this behavior by waiting on QS_ALLINPUT and actually dispatching window messages instead of just peeking). Beware that if you’re waiting for a worker thread *from* the main thread, and the main thread also controls the GUI, make sure your language implementation includes a way to pump window messages during WaitFor(), or your GUI will stop responding up until the thread eventually completes.

The importance of the “SynchroniseEvent” wait allows other threads (including, potentially, the thread you’re waiting for) to execute Synchronized() procedure calls on the main thread while you wait. If this wasn’t included, attempting a synchronized procedure call (also know as an Asynchronous Procedure Call or APC) could cause a deadlock.

GUI application waiting for user input

WaitMessage. Very simple!

Non-GUI thread in an application that needs to poll something

This is where it gets tricky. Applications should *try* to avoid polling whenever they possibly can, because by definition it wastes CPU cycles: waking up a thread just to check one value and then going back to sleep is wasting two context switches worth of processor every time. So, if you’re in control of the information you’re polling for, rewrite it as an alert or an event mechanism (using a callback function, a custom window message, or a waithandle; whatever works for you!) so that you can just wait until it’s done. However, if you’re not in control of the source, sometimes you have to poll.

while not (<check some event>)
   Sleep(50);

Works best in this situation. 50ms is short enough to remain responsive when the event happens, but generous enough that the OS will be able to give several timeslices to other processes before it comes back to you. Note that internally, Windows implements Sleep() using a timeout waithandle and WaitForSingleObject(), so doing that does the same thing with more calls.

GUI thread in an application that needs to poll something

Similar to above, except that you need to be more responsive. You don’t want to wait 50 milliseconds to respond to window messages; that’s long enough to potentially cause irritation. So what we want is something like WaitMessage, but with a timeout. Here’s how:

while not (<check some event>) do begin
   MsgWaitForMultipleObjects(0,nil,false,50,QS_ALLINPUT);
   PumpWindowMessages; 
end;

Here, we call MsgWaitForMultipleObjects, using a timeout of 50 millisec and telling it we’re interested in all window messages, but don’t pass it any objects. This means that the function will wait until either the 50 milliseconds have elapsed, OR any window message is received (which we then want to pump immediately, being a GUI thread). Then as long as we’re awake, we check the polling condition before going back to sleep.

That’s it! A bunch of common situations, and the appropriate and graceful way to wait, sleep or yield your application. Hopefully this helps someone! Oh wait, I forgot something:

Sleep(0) and Sleep(1)

So, it turned out that in all of our situations, we never needed these. But they’re talked about quite a lot, so they must do something. What do they do?

Well, Sleep(0) yields the remainder of the current timeslice, but the thread remains ready to execute. This means that if the OS has no other threads or processes waiting to do something, it will return immediately. If there are other threads waiting, our thread will sleep and it’ll come back as soon as the scheduler finds time for it. The net effect? If it’s the only thing running on that processor, CPU Usage % on task manager will remain high. If other stuff is running, it will compete with that to run whenever possible.

Outcome? Not much is accomplished. If your application is actually busy doing something, you don’t want to randomly give up your timeslice. The scheduler will pre-empt you when it needs to: trust it! As long as it’s given you the processor, you should be using it. “ending early” like this is just generating needless context switches, with the result that your computer is slightly less efficient than it was before. If your application is *not* busy doing something, then you’re waiting! And you should use either the “wait for input” or the “wait and poll” methods described above. Don’t just sleep for tiny amounts of time.

Sleep(1) is slightly different. The 1 millisecond tells the scheduler “do not come back to me for at least 1 millisecond. (and it might be more like 5 or 10). Net result? This will reduce the CPU Usage % on task manager, even if it’s the only thing running, because you’re forcing the OS to ignore you temporarily. So it’s slightly ‘better’ in that sense. However, the question remains: why are you doing this? If your application is actually busy, you shouldn’t be sleeping. And if your application is actually waiting, you should use one of the waiting paradigms described above.

If your application is “just a little bit busy but you know, nothing important”, what you should be doing is using a worker thread, and setting it to a lower priority. That lets the OS scheduler control exactly when to allocate processor time. After all, even if you’re not doing anything important, it’s better than doing nothing at all. If the scheduler is able to give you that time, you may as well use it.

The final scenario where this might be tried is “my application needs to do something every 1 millisecond; otherwise it’s idle”. Sorry, but your approach is doomed: Sleep(1) guarantees at least a 1ms wait – it could be (and will probably be!) much more. If you really need this precise behavior, you need something better. That’s out of scope for this article though, it’s long enough already!

2014
02.04

A short and simple post for today. Playing a video file in Freepascal, without using Lazarus.

I honestly couldn’t find any examples on the internet of how to do this, so I made one.

The DirectX9SDK has examples of how to use DirectShow/ActiveMovie to play back video files in C++, however, the common version of the DirectX headers for Delphi/FPC (Clootie’s graphics pages) don’t include the DirectShow headers, seemingly due to ActiveX dependencies.

For a lot of people, they might use video playback maybe once or twice in their whole application: playing a logo, or an end-of-game cutscene, and that’s it. Having to familiarise yourself with how to use COM, ActiveX and DirectShow in FPC seems like a lot of overhead just to achieve something so simple!

So, I created a handy class that takes care of all of that work and presents an extremely simple interface to anyone wanting to play video. Here’s how it works:

var
   VideoPlayer: TVideoPlayer;

begin
   VideoPlayer := TVideoPlayer.Create;
   VideoPlayer.PlayVideo('somevideofile.avi');
   VideoPlayer.Free;
end;

Download source code (including precompiled exe and sample video file) from the guide page.

The source code also shows you where to insert other code, if you want to do something useful while the video is playing (like loading other assets) or respond to user input.

2014
02.01

When your project is nearing completion, it’s time to look at optimization. For Direct3D, this typically means grouping and reordering of things to make the hardware work as efficiently as possible.

For Direct3D 9, Microsoft have published a handy guide of things to look at optimizing here. But it’s interesting to look at the difference these changes make in the real world, to a real game on real PCs: especially on slower PCs!

The real world case in this blog is based around my Five Hundred game. Although it’s a 2D game, it’s pure Direct3D 9 code, so most of the optimizations still apply. While developing the game, I was mentally building a list of things “to do late” to help optimize. Note that it’s not a hugely demanding game, so the goal isn’t to squeeze out an extra frame per second on a modern system. The primary objective was to make the game playable on even low-end, integrated graphics, XP-based systems, since I imagine a lot of the card-playing-public might well have those systems lying around.

This was my list of things to do, in (what I thought) would be biggest-to-least gain:

  • Minimize texture changes per frame (e.g. draw all of the card backs at once)
  • Bundle up objects into the same vertex buffers, minimizing DrawPrimitive calls
  • Change renderstates to only use alpha-transparency for the textures that use it

I implemented all of the above, but as an interesting side-project, I ran full evaluations after making each change (and with each change in isolation of the others) across all my test systems to see where the benefits were. The results? Curious!

Optimization Results

The benefits of each optimization varied greatly between different systems! To summarize, I’ll group them into three categories: “high-end” (in this case, a GeForce 770 with 4GB); “mid-range” (a Mobile Radeon 1GB) and “low-end” (Intel Express Q35 GMA integrated graphics).

Optimization Results
Optimization High-End Mid-Range Low-End
Minimize texture change 45% 25% 5%
Group vertex buffers 15% 40% 0%
Minimize alpha 0% 20% 80%

What does this mean? Primarily, Microsoft’s suggested optimizations work well for high-end systems. Minimizing the texture changes and vertex buffer calls added substantial performance benefits on those systems. Collectively, this is “reducing overhead” stuff. It’s worth noting that these systems blaze through the Five Hundred game drawing without breaking a sweat anyway. The 45% performance gain for minimizing texture changes on the high end system was going from 2,000 FPS to 3,000 FPS.

However, doing those things had almost *no* benefit on the low-end system. I suspect this is because the hardware is fully occupied with the drawing operations there; so it didn’t matter if the calls were being made more efficiently, the hardware was maxed already. The one change that benefit the Intel GMA the most was actually contrary to Microsoft’s recommendation of “minimize state changes as much as possible”. My testing concluded that it was extremely beneficial for low end hardware to turn alpha-transparency off whenever it’s not used, *even for just a single draw call before re-enabling*. The high-end system, in contrast, didn’t care in the slightest whether it was rendering alpha-transparency or not, and took no benefit at all (but didn’t suffer from the render state changes either).

The conclusion: for best performance across a range of systems, you have to optimize everything. But just because an optimization doesn’t appear to speed anything up on *your* system, doesn’t mean there isn’t a whole class of users out there who will feel the benefit!