2014
02.01

When your project is nearing completion, it’s time to look at optimization. For Direct3D, this typically means grouping and reordering of things to make the hardware work as efficiently as possible.

For Direct3D 9, Microsoft have published a handy guide of things to look at optimizing here. But it’s interesting to look at the difference these changes make in the real world, to a real game on real PCs: especially on slower PCs!

The real world case in this blog is based around my Five Hundred game. Although it’s a 2D game, it’s pure Direct3D 9 code, so most of the optimizations still apply. While developing the game, I was mentally building a list of things “to do late” to help optimize. Note that it’s not a hugely demanding game, so the goal isn’t to squeeze out an extra frame per second on a modern system. The primary objective was to make the game playable on even low-end, integrated graphics, XP-based systems, since I imagine a lot of the card-playing-public might well have those systems lying around.

This was my list of things to do, in (what I thought) would be biggest-to-least gain:

  • Minimize texture changes per frame (e.g. draw all of the card backs at once)
  • Bundle up objects into the same vertex buffers, minimizing DrawPrimitive calls
  • Change renderstates to only use alpha-transparency for the textures that use it

I implemented all of the above, but as an interesting side-project, I ran full evaluations after making each change (and with each change in isolation of the others) across all my test systems to see where the benefits were. The results? Curious!

Optimization Results

The benefits of each optimization varied greatly between different systems! To summarize, I’ll group them into three categories: “high-end” (in this case, a GeForce 770 with 4GB); “mid-range” (a Mobile Radeon 1GB) and “low-end” (Intel Express Q35 GMA integrated graphics).

Optimization Results
Optimization High-End Mid-Range Low-End
Minimize texture change 45% 25% 5%
Group vertex buffers 15% 40% 0%
Minimize alpha 0% 20% 80%

What does this mean? Primarily, Microsoft’s suggested optimizations work well for high-end systems. Minimizing the texture changes and vertex buffer calls added substantial performance benefits on those systems. Collectively, this is “reducing overhead” stuff. It’s worth noting that these systems blaze through the Five Hundred game drawing without breaking a sweat anyway. The 45% performance gain for minimizing texture changes on the high end system was going from 2,000 FPS to 3,000 FPS.

However, doing those things had almost *no* benefit on the low-end system. I suspect this is because the hardware is fully occupied with the drawing operations there; so it didn’t matter if the calls were being made more efficiently, the hardware was maxed already. The one change that benefit the Intel GMA the most was actually contrary to Microsoft’s recommendation of “minimize state changes as much as possible”. My testing concluded that it was extremely beneficial for low end hardware to turn alpha-transparency off whenever it’s not used, *even for just a single draw call before re-enabling*. The high-end system, in contrast, didn’t care in the slightest whether it was rendering alpha-transparency or not, and took no benefit at all (but didn’t suffer from the render state changes either).

The conclusion: for best performance across a range of systems, you have to optimize everything. But just because an optimization doesn’t appear to speed anything up on *your* system, doesn’t mean there isn’t a whole class of users out there who will feel the benefit!

2014
01.25

Consider this pseudocode:

Initialize COM;
Initialise Direct3D device;
Open Direct3D Window;

while (Program Window Open)
{
   VertexBuffer := D3DDevice.CreateVertexBuffer(size=100);
   VertexBuffer.Release();
   PumpWindowMessages;
}

Close Direct3D window;
Release Direct3D;
UnInitialize COM;

Is there a memory leak in the above code? This is just pseudocode, so I’m not trying to be tricky with syntax and I haven’t omitted any key steps in init or shutdown. The answer I should get is “no”. And yet, in reality, the answer is “maybe”.

All the code does is allocate a vertex buffer (which is the first step of any drawing operation in Direct3D), and then immediately release the buffer. Simple. There’s no memory leak here in the application. There’s no memory leak in Direct3D.

In the final stages of preparing my Five Hundred game for release, I started running test builds on all of the PCs I could get my hands on: everything from a Win7 Core i7 12GB ram GeForce 770, Surface Pro Win8 tablet, down to an old Core 2 Duo with Intel Q35 Express integrated graphics (i.e. an old low end office PC).

I noticed something very weird on the Core 2 Duo system. In Task Manager, over time, the Working Set (memory usage) continually increased. FPC’s heap trace tools showed no leaks. DXSpy showed all of the Direct3D objects were being constructed and released correctly. Working Set usage stayed static on all the other systems, only increasing on this one. What was going on?

Shortcutting past a few hours of debugging: I boiled down the problem to the minimal steps to reproduce, which was essentially the code above. It turns out that this particular Direct3D driver leaks memory: from the application’s perpective, and DirectX’s perspective, everything is all good, but there’s a reference remaining in the driver that will never be cleaned up until the application ends. The conclusive proof was that when using the “reference rasteriser” (where D3D just runs a software emulation instead of using the hardware drivers) on the very same system, the problem just disappears.

After a lot of time spent searching the web, it turns out I wasn’t the first one to come across this issue, but every suggestion I could find just basically told the questioner “you’re obviously not releasing your resources correctly. check again”, and that was the end of the discussion. While that’s always something you should check first, and check very thoroughly, it’s not *always* the case. I wanted this on the net and hopefully to show up in search results in case anyone in future comes across this, they can prevent the hours spent to figure out the problem wasn’t in their code.

Fortunately, the memory leak is avoidable. In an optimal program, you shouldn’t really be allocating and releasing vertexbuffers all the time: the goal is to allocate as many as you need and then just re-use them through the application. I had planned to do this anyway, so finding this just raised the priority on that for me. And it’s not the end of the world: at least it’s not D3D hardware resources being wasted, just working set, which will eventually be swapped out because it’s never touched again. It’s irritating though.

Those Intel chipsets are out of support, so the last driver update was 2009 and this will never be fixed. There are a large number of PCs out there however, because those chipsets were basically used in almost every XP-era system that didn’t ship with a dedicated graphics card (including millions of PCs for offices). Support for those systems is essential for me for a Five Hundred card game, because I imagine a lot of the potential players might well be people that haven’t invested in the latest gaming-spec systems.

Next time (and I now plan to write articles every Sunday for this blog from now on!), I will talk about the effect of different code optimizations for low-end (these old XP, Intel GMA system) and high-end (GeForce 770) systems for Direct3D. I was quite surprised by my findings!

2014
01.19

After a delay of way longer than expected, I am happy to announce that the redesigned Five Hundred game V1.00 is now available!

Five Hundred game

Five Hundred released!

I had hoped to have this done several months ago, but I kept getting sidetracked into adding extra features. Some of the new things you can see in this game:

  • User customisable cardsets, cardbacks and tables
  • You can use cardsets from any of the most popular formats on the internet: Reko, RKP, and WizSolitaire cards. Download 1,000s of custom cardsets from www.rekonet.org
  • Vastly improved opponent artificial intelligence from the original version
  • Should work on any computer with DirectX9.0c or higher, and Windows XP or better. (any computer that’s too new for the original version!)

Of course, the first version doesn’t mean development is complete. I’ve tested it on as many PCs as I can get access to, but a wider audience may introduce new bits and pieces to address. Please feel free to email me with any issues, comments, feedback or suggestions for newer versions.

I’m already hard at work on adding new improvements for the next release. Currently in the development process is:

  • Six different opponents, which influence both bidding and playing styles
  • Network play – play with or against your friends on a LAN or over the internet!
  • Misere and Open Misere variants (as well as other variants, if anyone suggests or asks for them.. let me know how they work!)
  • Make it work more smoothly on touch displays on Windows 8

Enjoy! You can download the game from here.

2013
06.06
Five Hundred game screenshot

Five Hundred game

Update: Version 1.00 now available – get it from this link!

Wow, far, far too long between posts!

Icefall development is currently on hold for a little while as… the problem is… I’m not finding it as fun to play as it should be. So I think I need to take a break and focus on something smaller and easier for a little while.

Out of all my past projects, my “X500” game is the one I still get the most comments and emails about. Back in the day, I wanted to play the Five Hundred card game, but, looking around the ‘net, decent implementations were pretty much non-existant. So I created a version and released it online. Surprisingly, it actually was fairly well received and became quite popular (Completely Free Software gave it a “5 stars award” which was pretty cool).

Unfortunately, back at that time I was working on a game/graphics library that was source-compatible between DOS and Windows/DirectX versions… (looking back, I have no idea why I clung to DOS compatibility for so long.. oh well). As a consequence, I couldn’t implement a 100% compatible DirectX implementation for the Windows version… and as a result,  it no longer runs currently on modern Windows versions.

So, time for a rewrite! Looking around the internet, I came to the disappointing realisation that there are still no really good Five Hundred games out there! So I’m currently putting the finishing touches on a redesign (early screenshot of the new version above) for Windows XP, Vista, Windows 7, and 8 PCs (yes, it will even support touchscreen play since I recently bought a Surface Pro tablet for testing).

I also want to take the opportunity to improve some of the AI from the old version. I hope to release the final version by the end of June!

2012
01.01

Let’s take a look at pathfinding. Pathfinding is the process where the game can determine a suitable or optimal path from one point to another by traversing the game’s terrain. It’s a critical part of almost all games (RPGs, RTSs, turn-based strategy.. basically anything that has a variable terrain or map is going to need a way to determine paths across it)

Pathfinding is hard in the sense that it takes a lot of computing power to accomplish. And it doesn’t get easier with newer hardware either – as computers get faster, games generally incorporate more complex terrain and more inhabitants, so pathfinding stays complex. For example, the average RTS devotes 30% of CPU time to video (prepping stuff for the GPU, plus UI etc), 30% on AI & scripts, and 30% on pathfinding (the remaining 10% on everything else).

Icefall uses two different pathfinding algorithms to improve efficiency. Being turn-based, I don’t have to worry about time-critical calculations, so paths should always be optimal path (it’s acceptable for realtime games to take a ‘best guess’ or suboptimal path – you can see this in games like Dragon Age or Diablo: if you click somewhere far away or obscured by obstacles, your character will try to walk there but they get stuck). Let’s talk about how Icefall’s pathfinding works specifically.

A* Pathfinding Algorithm

A* (Wikipedia entry) is the method Icefall uses to calculate the player’s path. If you click on a distant doorway or shop, your character will move towards it using the fewest steps possible, regardless of how convoluted the actual path is.It works by starting at the player’s current location, and looking at all of the neighbouring squares (all the squares that the player could reach in a single move). For each of those squares, it looks at the *absolute* distance between this position and the goal position (that the player is trying to reach). The absolute distance is just ‘the distance in a straight line, assuming there were no obstacles’.

This (simplified) example shows a very simple dungeon with a start and goal square. The three initial candidate squares have been marked with red circles (this is called ‘the open list’ of squares), and the number within represents the absolute distance between each square and the goal.

Initial step of A* evaluation

The algorithm selects one of the squares from the open list that has the lowest absolute cost. So in this case, it could select either of the “6” squares. Let’s say it picks the bottom one (to try and go in a straight line). The process is then repeated, with all of the neighboring squares from our new square evaluated for their absolute distance to goal – but with a new step. We add to that distance the number of squares that we have already taken to get there (in this case, 1, since we’ve only moved once so far). Note that we also briefly consider the start square, and the other two red circles, but we discard them because they’re already in our ‘open’ list.

After evaluating the bottom '6' square

We’ve found two new candidate squares to add to the ‘open list’ – both are of absolute distance to goal 5 and have a distance travelled of 1 (+1 in the diagram). Because we’ve now fully evaluated that bottom ‘6’ square, it’s moved from the open list to what we call the “closed list” (which just makes sure we don’t end up re-evaluating squares that we’ve already done).

Now, the process continues by picking one of the squares in the open list with the lowest absolute distance + distance already travelled. If multiple squares have the same total, we pick a square with the highest distance already travelled first – this ensures that if there is a direct path to the goal, we find it immediately. If we continue this for a few move evaluations, we’ll end up with something like this.

After evaluating 4 more squares

We evaluated the next two squares in the dead end, as they both had a combined total of 6, and they had the highest distance already travelled values. But then we reached a dead-end there (there were no valid moves from that 3/+3 square), so we next evaluate the remaining 5/+1 square. on the open list. It finds two new squares to add, a 6/+2 and a 5/+2. After this picture, the lowest total value that’s still open is actually the original red circle (it has a cost of 6. Our other open squares have values 7/0, 6/+2, and 5/+2). When evaluating it’s neighbors, we find that it has a shorter route to the square directly above it. Instead of 6/+2, it can be replaced with 6/+1 – this is fewer squares travelled to reach the same goal. So this square value is replaced in the open list. Evaluation then continues. Eventually, we’ll end up with the following table, and the optimal path.

 

Final path determined

In this contrived example, we end up evaluating almost every square, but the important part of this algorithm is that it will ALWAYS find the MOST EFFICIENT path possible. Apply these steps to any path (assuming the path can actually be reached), and you will get the best possible result in the end (note that this path has effectively “skipped the corners” in the tunnel, without us having to do any special coding to take this into account).

 

Summary

So, that’s how Icefall calculates paths for the player. Seems easy enough, but you can see a lot of squares may end up being evaluated (especially in a big dungeon that has dead ends). So, how do we calculate the pathfinding for monsters? Since there may be dozens or hundreds of these in an Icefall dungeon…? Stay tuned for next time!

 

2011
09.13

Lately, I have been rewriting some of the subsystems in Icefall. Now that the core of the gameplay is in place, it is time to go back and implement flexible/efficient versions of systems I originally coded in very fast to get the basic game up and running. Mainly this involves recoding subsystems to do things like handle errors more gracefully, respect the user settings (and respond correctly when these are changed on the fly), and cover everything in detailed comments so that future generations can understand how it worked.

The audio engine is the latest of these. It has been redesigned and now supports multiple channels of audio, with their own volume/mute controls, and a master control that lets you mute everything in one fell swoop. The game also ploughs on if it tries to initialise the audio library and fails – now it just skips loading any audio resources, disables and greys out all the audio UI settings, and carries on. Much better than popping an “Audio engine failed to initialise. Icefall will now exit” error message 🙂

Here’s a screenshot of the new Audio page in the Options dialog (game music and sounds respond to any of these being changed, in real time):

Icefall's Audio Options dialog menu

Audio Options

2011
05.30

Net Status

A small break from Icefall. My network has been a bit flakey recently, and it was irritating because it wasn’t always immediately obvious when it died. So I wrote a little application that can sit in the notification area and send a regular ping to a designated address/site, and make a sound if it suddenly dies (and when it comes back).

Also, I’ve been using C# for a couple of projects, and I wanted to step back from the managed, garbage collected goodness for a while. So I decided to forego all UI toolkits and just write direct Win32 API calls to create all my UI elements. Painful, yes! But fun too 🙂 Here’s the application:

WinNetStatus Application

It also includes full FreePascal source code. I managed to remove all external dependencies on other code, so it will compile solely from the FPC Win32 RTL. I’ve also commented almost every line, so on the off chance someone is as crazy as me and wants to see how to program the raw Win32 API from FPC, this is probably a good example!

You can download the application (and source code) from this link.

2011
02.10

Sometimes, you produce a UI that makes perfect sense to you, but when you introduce it to another player, they can’t follow it.

That happened to me with Icefall’s UI bar. Originally, my layout placed the player character’s health and mana bars in the bottom left corner, with the player’s current target displayed next to them. The action buttons and experience bar were in the bottom centre, and the status window was on the right:

Original UI Layout

Original UI Layout, with player/target health bars at the bottom left.

This made sense to me, and was based on Angband displaying the player/monster health stats in that corner. However, when I got a couple of friends to play it, both of them struggled to track the player’s health without me explicitly pointing it out, and one got the health bar confused with the experience bar in the centre.

So, as a result of this, I did some tweaking of the layout and the UI graphics. I’ve put the player and target information in the centre, and shuffled the action bars to the left hand side. I have to admit, after playing for a while, this configuration makes more sense:

New UI Layout, with player/target health bars in the centre

I also took the opportunity to update the UI backdrop to make the player/target area jump out a little more with a “leather canvas”-type frame.

The lesson? Always test the UI on people who have never seen the game before, and be resigned to the fact that you might have to change it.

2010
11.20

Hi again.

Although I haven’t posted anything for a long time, I have still been (occasionally) developing Icefall. There are three new concepts which I have introduced since my last Icefall post – two enhancements to the visual engine, and one to the game core. Today I’ll talk about one of the visual enhancements – seamless textured walls.

If you look carefully at some earlier Icefall screenshots, you’ll see the white and teal-coloured wall graphics. I actually thought these were pretty cool at the time (certainly an improvement over most Angband clones’ “single tile for every wall”, as they had a slight 3D aspect, and wall tiles “linked up” with those next to them. The only thing I wasn’t happy about was when there was a wall that was exactly two tiles thick, and the player could see both sides (e.g. they were two separate rooms), these ‘wall texture’ bubbles would appear in the centre of the wall – this was because the tiles didn’t check their diagonal neighbours, so they had to assume there wasn’t wall there (checking in the old engine would have meant I would need 256 different wall tiles for every combination of neighbours, instead of 16 when they only link up horizontally and vertically).

Old-style Icefall wall tiles

But, this was fixable! I have recreated the whole wall-drawing algorithm so that it now takes three passes instead of one. The first pass draws a nice ‘wall texture’ (the actual texture can vary depending on the level, so the deeper you go into the dungeon, the walls themselves will start to look more sinister).

Secondly, an “outward-facing” wall texture is applied. This fixes up all the edges between wall tiles and ground tiles, similar to what happened before except that this tileset has no diagonal corners. Finally, another pass is made over all wall-edges, looking solely for diagonals and filling those in as needed.

Although this takes more GPU power than the old method, GPU power is not something I’m really worried about for Icefall, as the visual engine overall draws the game world very efficiently. (I must talk about this sometime if I haven’t already). And the results, hopefully you will agree, look a lot nicer!

New wall-drawing algorithm.

Next time (which I promise will not be as far away as this update was from the last!), I will talk about displaying the player character themselves – as they’re now displayed in the gameworld with the equipment/weapons they have equipped!

2010
06.22

Xbox 360 – KINECT

A quick look at the Xbox 360 Kinect… looks good. Note that it’s much faster response time on the guy in the red sweater compared to the chick in the black and white. At previous demos, the demonstrators have been wearing red too. Apparently, they worked firstly on fine-tuning the system to minimise latency on red, and they’re working to bring other colours/patterns up to the same speed between now and launch (November). I think this shows the latency could be VERY acceptable!