Posts Tagged ‘optimisation’

Cityscape – update 5.2

Saturday, May 16th, 2009

Okay, the 16-bit/lack of Intel compatibility thing was bugging me, so I fixed it, and got, uhm, a few more polygons out of it as a result. How many? How about ~18.5 million polygons per second?

So, what have we done now? Well, I’ve changed the BuildingBatch class so instead of a single huge batch, it creates a bunch of smaller batches of a few thousand (the exact value is easily configurable – the parameter passed into UpdateGeometry() is the number of maximum number of vertices per buffer). This means I can both change the index buffer type back to 16-bit, which means it works on Intel graphics cards again and also squeeze even more performance out of the graphics card. The reasons why a single massive buffer is less efficient than a number of smaller-but-still large buffer is complex – it’s generally agreed that an optimal number of vertices is around the low thousands, though, and experimenting with my buffer sizes bears this out, and if you want a play, you can check out revision 15 from the repository and tweak this batch size in Game.cs – I’d be interested to hear what works best for your graphics card.

I’ve got another update coming shortly, but it’s sort of tangential to the main project, so I’ll stick it in another post.

Cityscape – update 5.1

Friday, May 15th, 2009

So, yeah. I figured out why it didn’t work on my NC10, and it’s for reasons I’d hoped I wouldn’t have to worry about just yet.

See, XNA and DirectX like to pretend that all graphics hardware is equal and can all do the same things and that. And it does quite a good job, so long as all you want to do is draw a few boxes, which is, uh, what we’ve been doing so far. Unfortunately, not all graphics hardware is equal, and actually, there’s some fairly big differences between them.

For example, index buffers. Index buffers are simply a list of numbers that represent indexes into your vertex buffer. You can pick between two formats – 16-bit and 32-bit. With 16-bit, you’re restricted to a maximum vertex index of 2^15, or 32,768 (because, for some reason, they’re apparently signed values, even though a negative index makes no sense). A 32-bit index buffer gives you indices all the way up to 2^31 about 2 billion – far more than you’re likely to need. Previously, I’ve been using 16-bit index buffers, as my vertex buffers have been small and so I didn’t need to capacity of a 32-bit buffer. However, with the change to one single, huge vertex buffer, I needed more than 32,768 vertices and thus switched to a 32-bit index buffer.

Unfortunately, the Intel drivers (the vertex shadey part of the equation actually happens in software on Intel chipsets, although they do have hardware pixel shader support) don’t seem to support 32-bit index buffers. I can create it, stick my indices into it and even successfully call DrawUserIndexedPrimitives<> – but nothing appears onscreen. Culling the number of buildings and using a 16-bit vertex buffer means it works fine.

Again, there are things I can do that will help this – split across several 16-bit index buffers, for example – and it’s actually not necessarily the most efficient thing to render using a single, massive vertex buffer, either: but there’s some optimisations I want to do later than will probably allow me to switch back to using 16-bit index buffers anyway, so for the moment, I’m sadly just going to ignore my little NC10 and stick to developing on machines with more grunt. Ah well.

Cityscape – part 5

Friday, May 15th, 2009

Get a load of this:

See that? That’s a 41×41 city, with 4 boxes per building – a whopping 80k polygons in total – running at our nice full 60fps. That’s over twice the polygon budget of the best of our previous versions, and dramatically more frames per second. How is such dark magic achieved?

Well, remember what we talked about last time? How graphics cards are much happier if you just give them a load of polygons and let them render them without interruptions? That, in a nutshell, is exactly what I’ve done here. Rather than rendering the buildings one at a time in a loop, we take advantage of the fact that buildings basically don’t move that often, and batch them all up into one big array as soon as they’re built. Then, instead of having to loop through a whole bunch of buildings, we just render the one buffer – much better.

Code-wise, we’ve had to do a bit of refactoring here – Buildings are no longer XNA GameComponents, and instead we’ve got a BuildingBatch DrawableGameComponent to handle that side of things for us. We create our buildings, add them to the BuildingBatch, then tell the BuildingBatch to update its geometry – whereupon it iterates through its collection of buildings, grabs their geometry and stuffs them all up in a single vertex/index buffer.

Inevitably, there’s some tradeoffs: by doing our heavy lifting upfront, we limit the amount of stuff we can do once the game is running. So, there’s no easy way to do, say, object culling or dynamic level of detail using this sort of scheme – the time taken to copy all the buildings into the batch buffer is about 150ms – not long, but much more than we’ve got available if we’re trying to render 60 frames a second. And as our buildings get more and more complex, we’re inevitably going to hit an upper limit for which this approach also no longer works – but I’ve got some plans for how to deal with that, too.

Additionally, our buildings are all looking very uniform – as there’s no way to change shader parameters between each building, there’s no way to, say, pass in a custom colour modifier to liven things up a bit – but again, there are ways around this that I’ll talk about at some point later on.

We’re up to bzr revision 14 now – and for some reason, this latest version doesn’t work on my NC10. I am investigating why and should hopefully have a fix soon.

Cityscape – part 4

Friday, May 15th, 2009

Things have come on a bit since yesterday. I’ve added a first-pass of the building textures, and a polygon count (for reasons which will become apparent soon enough. Now, it looks like this:

The texture generation is simple enough (and mainly ripped off, like the rest of this project, from TwentySided’s PixelCity) – a 512×512 near-black texture, with 8×8 blocks of either light or dark grey scattered across it. To get the solid-black of the tops of the buildings, I’ve simply set the texture map to a single pixel at the bottom left of the texture.

Also, you’ll notice that I’ve added more buildings – in that screenshot, it’s a 31×31 grid of buildings (with random heights, for a bit of interest), and added a polygon count. Now, the reason I added the polygon count is interesting, and we’re going to learn a bit about how graphics cards work in the process.

So, I increased the number of buildings to 961, and the framerate stayed nice and happily at about 60fps. Excellent. Then, I got greedy, and made the grid 41×41, giving me 1681 buildings – and the framerate plumetted to around 40fps. This seemed odd – I’m not yet hitting really big polygon budgets. Games these days happily push hundreds of thousands, if not millions, of polygons per frame, and whilst XNA is doubtless adding some overhead, this machine manages to pull off things like Demigod and Dawn of War 2 on respectable settings without too much pain. So something, clearly, is up. I added a polygon counter just to check my sanity, and yes, we’ve nearly doubled our polygon budget, but it’s still only about 20k polys:

Hm. So, what do we do in this kind of situation? Well, we investigate! We’ve basically doubled our poly budget by doubling the number of buildings, right? Well, more or less, anyway. How about instead, we drop our number of buildings back to the original number, and increase their complexity? What happens then? So, I added the base back on, which is essentially just another box at the base of every building, and…

Wait, what? We’ve now got more polygons than we had in our 41×41 scenario and we’re still at ~60fps. Something odd is happening here. So what happens if we add more (mini-tower on top) and more (two stage body for the building) complexity? We can do three times as many polys as our original 31×31 city without a framerate drop – it’s only once we get to four times as many – fourty-six thousand, over twice as many polygons as in our 41×41 city – that we start to see any impact at all, and it’s only a couple of fps. What the merry hell is going on here, then?

Well, remember how our buildings are constructed: for each box, we insert the vertices and indices into an array, then convert these to fixed-size vertex buffers for use when rendering. There’s one of these buffers per building. As we’re adding more complexity to our buildings, we’re increasing the amount of stuff in each buffer; however, when we add more buildings, we’re increasing the number of these buffers that need rendering – so it seems that it’s the number of buffers (or, more accurately, the number of draw calls we make) that’s our killer here.

And this is an important point: think of a graphics card as being like a car engine. You get most efficiency out of an engine when you drive smoothly, and a moderately high (but not excessive) speed, without changing speed too much; driving around a city where you’re constantly braking, changing gear, stopping, starting and so forth is absolute murder on your engine’s efficiency – and it’s the same thing for graphics cards. Hand them a big buffer of triangles to go off and render, and they’ll tear through it at high efficiency. Hand them a whole load of small buffers, or change texture or shader regularly, and they really start to struggle.

The reason for this, more-or-less, is that the time for a render call is split into two parts – a fixed setup cost, and a variable rendering cost. If you’re rendering a large number of small batches, your render time is going to be dominated by a very large number of fixed setup costs; however, if you render a single large batch, you only have to do this setup once, and then the rest of the time can be spent on actually drawing the triangles.

So, what can we do about this? Well, that’ll have to wait until next time – but remember, all of our buildings are using the same texture, shader, vertex format, etc – so an obvious solutino should hopefully present itself!

(We’re up to bzr revision 13 now, if you’re following along at home)