02.01
When your project is nearing completion, it’s time to look at optimization. For Direct3D, this typically means grouping and reordering of things to make the hardware work as efficiently as possible.
For Direct3D 9, Microsoft have published a handy guide of things to look at optimizing here. But it’s interesting to look at the difference these changes make in the real world, to a real game on real PCs: especially on slower PCs!
The real world case in this blog is based around my Five Hundred game. Although it’s a 2D game, it’s pure Direct3D 9 code, so most of the optimizations still apply. While developing the game, I was mentally building a list of things “to do late” to help optimize. Note that it’s not a hugely demanding game, so the goal isn’t to squeeze out an extra frame per second on a modern system. The primary objective was to make the game playable on even low-end, integrated graphics, XP-based systems, since I imagine a lot of the card-playing-public might well have those systems lying around.
This was my list of things to do, in (what I thought) would be biggest-to-least gain:
- Minimize texture changes per frame (e.g. draw all of the card backs at once)
- Bundle up objects into the same vertex buffers, minimizing DrawPrimitive calls
- Change renderstates to only use alpha-transparency for the textures that use it
I implemented all of the above, but as an interesting side-project, I ran full evaluations after making each change (and with each change in isolation of the others) across all my test systems to see where the benefits were. The results? Curious!
Optimization Results
The benefits of each optimization varied greatly between different systems! To summarize, I’ll group them into three categories: “high-end” (in this case, a GeForce 770 with 4GB); “mid-range” (a Mobile Radeon 1GB) and “low-end” (Intel Express Q35 GMA integrated graphics).
Optimization | High-End | Mid-Range | Low-End |
---|---|---|---|
Minimize texture change | 45% | 25% | 5% |
Group vertex buffers | 15% | 40% | 0% |
Minimize alpha | 0% | 20% | 80% |
What does this mean? Primarily, Microsoft’s suggested optimizations work well for high-end systems. Minimizing the texture changes and vertex buffer calls added substantial performance benefits on those systems. Collectively, this is “reducing overhead” stuff. It’s worth noting that these systems blaze through the Five Hundred game drawing without breaking a sweat anyway. The 45% performance gain for minimizing texture changes on the high end system was going from 2,000 FPS to 3,000 FPS.
However, doing those things had almost *no* benefit on the low-end system. I suspect this is because the hardware is fully occupied with the drawing operations there; so it didn’t matter if the calls were being made more efficiently, the hardware was maxed already. The one change that benefit the Intel GMA the most was actually contrary to Microsoft’s recommendation of “minimize state changes as much as possible”. My testing concluded that it was extremely beneficial for low end hardware to turn alpha-transparency off whenever it’s not used, *even for just a single draw call before re-enabling*. The high-end system, in contrast, didn’t care in the slightest whether it was rendering alpha-transparency or not, and took no benefit at all (but didn’t suffer from the render state changes either).
The conclusion: for best performance across a range of systems, you have to optimize everything. But just because an optimization doesn’t appear to speed anything up on *your* system, doesn’t mean there isn’t a whole class of users out there who will feel the benefit!