This post comes a lot later than I had planned. Distractions by the names of Dragon Age and Civilization V got in the way. And when I did actually start to work on the project, I ran into some performance issues that took some time to figure out.
I have made major changes in both design and implementation of the original cube world.
-Removed the naive renderer as it was worthlessly slow for comparisons and was becoming troublesome in upkeeping with the rest of the changes being made.
-Added visibility culling.
-Changed the way textures are handled to allow multiple types of cubes.
-Added a profiler to help track down performance issues.
Visibility Culling
I decided to implement spatial partioning with a quadtree of fixed height, instead of an octree. Considering my terrain isn’t very vertical, I didn’t see any major gains by adding height into the equation and essentially increasing the number of checks when traversing the tree.
To do the visibility checks, I construct a bounding frustum out of the current view and projection matrices from the camera. Then, I traverse the tree looking for nodes that intersect the frustum. No cube data is stored in the upper levels. They are purely there for subdivision. All of the cube data is in the deepest nodes. And since I constructed the tree to exactly fit over my terrain and to be a power of 2, all of the cubes are exact fits in the tree at the lowest levels. How I deal with the cube data of visible cubes depends on the current drawing technique.
With mesh instancing, I ran into a few problems. I was initially trying to traverse all 9 terrain patches and maintain a list of all of the visible cubes and draw everything in one batched instance draw. However, each of the terrain patches is centered on the origin. So, to draw them in the appropriate places, I would have to offset each position by adding an offset value appropriate to the patch.
That turned out to be fairly slow. Profiling showed that I was spending most of the time in the copying of instance data into the list of current positions. Initially, I thought it was the adding of the offset value for every single instance that was slowing it down. So, I did a quick test. I removed the offset addition from the code. The patches would draw in the wrong spot, but I would get a better feel for how much performance gain I would get. I got nearly nothing. A few extra frames per second at best. That was not the issue.
The real issue was in how I was copying the array data. To accommodate the addition of the offset value, I was looping through the arrays and copying item by item. This was monumentally slower than using Array.Copy(). Array.Copy() is just a straight copy, though. I wouldn’t be able to use the offset value with that type of operation. So, I had to get rid of it somehow. I started with the easy route and went back to how I was drawing before. I drew each of the 9 terrain patches individually with different view matrices to offset their positions. Without visibility culling, this was very slow. But, with visibility culling (and alternating multiple vertex buffers), this proved to be quite efficient. I get 90+ FPS on the XBox with most of the time being above 100.
Something that I thought about doing was to create the terrain patches with the offset value. Terrain creation would not have been affected very much, as that is run in a separate thread. However, I would have to be more strict about which patch gets assigned to the nulled areas as position mattered at that point. The biggest problem that I foresaw was that when shifting the 6 patches that still remain on the board, I would have to shift their offsets. That could of been expensive and would need to be accomplished in a single frame in order to keep the illusion of seemless travel. If I could manage to do it, though, I would be able to draw the entire terrain in a single draw call again. I wasn’t feeling confident that it would work out, so I didn’t try it.
Working with the static buffers turned out to be a bit different. After finishing up the mesh instanceing code, I tried the same thing with static vertex buffers. Traverse the trees and add each visible vertex to a list of currently visible vertices and then eventually draw them. This proved to be incredibly slow, as I thought it might. The amount of data to copy far exceeded that of the mesh instancing, so it was naturally slower.
I decided to try drawing it all with no data copying. I created a vertex and index buffer in each low level quadtree node and populated them with the data for that quad alone. Then when it came time to cull visibility, instead of copying the data of the node to a master list, I simply drew it right there on the spot. I thought this would be pretty slow with a ton of draw calls. Turns out, I was wrong and I learned some new things about the behavior of video cards. This proved to be much faster than mesh instancing at 200+ FPS on the XBox.
I was going to add more improvements to the static vertices technique, like pruning out vertices that will never be visible or segregating the data out so that I could check against which faces are visible and only draw those. However, the speed was already double the work done in mesh instancing. So, for comparative purposes, I felt the job was done.
Multiple Cube Types
It wouldn’t be a proper Minecraft-like terrain engine if one could only draw one type of cube. So, I had to make sure that my technique could support many different types of cubes at workable speeds. I decided to use a spritesheet technique, so that it could all be supported with a single draw pass.
The current sprite-sheet holds the textures for 4 different cube types. The base cube’s texture coordinates are aligned to fit the first texture in the sheet. To draw the other cube types, simply change the texture coordinates. For the instancing technique, I added an offset to the vertex data being sent per instance to include an offset value. For the static vertex buffer technique, I simply changed the texture coordinates of the vertices in the buffers.
Selection of which cube texture to use was height based in the order of water, grass, stone and snow being the highest. This was just how I chose to generate them, but the code is in no way limited to only this arrangement. Support for cube types is only limited by how big of a texture you want to generate for the cubes.
Conclusions
My initial goal was to prove that mesh instancing could be just as viable as using a static vertex buffer for drawing this type of terrain. I did not achieve that goal. At 90+ FPS on the XBox, I did prove that mesh instancing could be used to create such a terrain. But, at double that performance, the static vertex buffer would be a superior choice. And that is without attempting further known optimizations for the static vertex buffer technique.
I bow down to the static vertex buffer of cubes.
The Code
The code can be found here. Again, it is written with XNA Game Studio 4.0. It is fully documented. Some implementation details can be found more in depth by reading the code and comments.
The controls remain the same, aside from removing the option for the naive renderer.
As with the previous project, I encourage anyone who finds anything useful in it to be used in your own projects. The previous considerations with regards to code that I used from other resources still pertain. In addition, the new textures for the cube types are a random sampling from Filter Forge.
I usually try to avoid arguments regarding picking the “best” implementation strategy. I believe in the cliche of using the right tool for the right job. Sometimes, that means using something sub-par for performance, because it is easier to implement. If you don’t have performance issues, why bother with the more complicated solution?
Anyways, a few weeks ago, I responded in a thread regarding improving a cube terrain renderer similar to what Minecraft does. My suggestion was to use mesh instancing. Having used instancing a lot in the past, I was very familiar with its beneficial gains. However, the idea was immediately dismissed by a few posters. Once again, right tool for the right job, and I believed that most of the ideas being presented could be made to be workable, including instancing. Instead of continuing the argument, I decided it would be better to put my code where my mouth was and prove it.
And this past weekend, I have started my attempt to do just that. Almost starting from scratch (more on that later), I created a prototype of a randomly populated infinite world of cubes. Since I was working towards prototyping different rendering techniques, I had modularity in mind. There are 3 working techniques being used, a naive renderer that renders each cube individually with basic effect; an instanced renderer that uses mesh instancing, and a static vertex buffer rendering that preprocesses the vertices into world space and renders the buffer as is.
Some background on the two important techniques as I understand them:
Mesh Instancing
Mesh instancing is a process where a large buffer is filled with identical copies of a model mesh. Along with those copies, one would also send state information about the model to help the GPU determine where the vertices should be ultimately placed. The instancing sample in the education catalogue sends a world matrix, but this could be anything.
The benefit over drawing the same mesh individually is far less draw calls. The amount of data being sent to the GPU is the same.
Static Vertex Buffers
This is an area that I will admit that I am less familiar with. However, as I understand it, this involves taking the vertices of everything related and combining them into a single mesh. By preprocessing the vertices into world space, one would eliminate the need to pass a world matrix to the shader. Also, certain techniques can be applied to the mesh to reduce redundant vertices and stitch everything together.
The end result of benefits over the naive mesh rendering is less work done with transformations, less data being sent to the GPU, and of course, less draw calls.
Comparing Static Buffers to Mesh Instancing
When looking at the two techniques, the right tool to draw a large unchanging landscape of cubes would appear to be static buffers. And it is a great tool for the job. Static buffers does really well on static content (this is demonstrated by the code that I have so far). It loses some of its luster on more dynamic content, as the preprocessing of transforming and restitching the vertices can be expensive.
However, I believe that mesh instancing can be just as viable for this problem. With a world of cubes that are axis aligned and built to the right size, no scale or rotation data is needed. Just translation. This reduces the extra data overhead to 3 floats per cube instead of 16 for a world matrix.
When adding in visibility culling, the data being sent to the GPU becomes more dynamic, something instancing does well.
By separating the terrain data into small patches, we can still benefit greatly with static vertex buffers. However, because each patch would not be the same size, every visible patch would need to be rewritten to the outgoing buffer every time the view changed. That means writing vertex data for every vertex in every cube to be drawn. Reducing and stitching together redundant vertices will reduce that amount, but it isn’t a guaranteed amount of savings. Worst case is still that every vertex is copied.
With mesh instancing, the vertex buffers always remain the same. They never need to be rewritten. As visibility changes, the only thing that would need to be updated is the state information per cube. Since, we reduced this work to be only position data, it is 3 floats per cube, worst and best case.
This is all CPU side work. Instancing wins in the worst case, unless I’m missing an obvious optimization of static buffers.
On the GPU side, static buffers are already done. Take the vertex and apply the view and projection matrix and shoot it to the pixel shader. With mesh instancing, we still need to apply the state information to the vertex. In this case, just add a vector3 to translate the position to world space. Not a lot of work, but it is per vertex, so it adds up. We also had to send that extra data to the GPU, creating more overhead.
The question becomes, does the reduction in CPU work outweigh the extra GPU work that needs to be done?
Conclusions so far
The code that I have written does not yet include visibility culling. Nor does it do any fancy stitching work for the static buffers. So far, on the Xbox, the application runs at 75+ fps with static buffers. With mesh instancing it is around 20+ fps. And with the naive rendering, it is a frame every couple of seconds.
Clearly, I am losing my argument, so far. If I had a game idea and wanted to use this code, I could go to production with static buffers without any further optimizations, assuming I didn’t get bottle necked elsewhere. Preliminary profiling suggests that everything is GPU bound. So, moving some of the work to the CPU with visibility culling should have benefits for all techniques. With any luck, that will move mesh instancing to a more competitive frame-rate.
The Code
I have included my code, so far. It is fully documented and written with XNA Game Studio 4.0. It can be found here.
The code generates randomized cube terrains and allows the user to traverse them. The world is infinite, automatically shifting and generating new terrain as the user crosses a boundary.
There are 9 patches of terrain with the user being in the center one. When the user crosses the boundary, their position is wrapped back to the other side and all terrain patches are shifted to give the user the illusion that they are continuing to seamlessly move forward. That means that 3 patches are dropped off with every shift and 3 more need to be made to replace them. I could have had the patches wrap to the other side, but with a real game that includes dynamic content, one of the concerns is garbage collection. I wanted to make sure that the prototype wasn’t being crippled by the collects that come with constantly loading in new content. So, the code will generate 3 brand new random patches. The UI displays a box for each of the 9 patches. When the terrain shifts, the patches that are now null (dropped) are lit up as red and when they are filled in with valid terrain, they are green.
The code is mostly original. The instancing shader was a direct copy from the instancing sample on the creators club education catalogue, which I then gutted and modified to this particular use. The static buffer shader is a copy of the final instancing shader without the extra data and position information, so that they would have the same lighting model for comparisons. The frame-rate counter is a modified version of Shawn Hargreave’s frame rate component. I derived the fractal terrain generation code from a description of the diamond-square algorithm that I read on this site. Finally, the texture used to color the cubes is Shawn’s immortal cat.
I have no issues with folks using my code for their own purposes, in fact, I encourage it and hope folks find it useful. However, proper legal measures should be considered for the aforementioned pieces of code. Even if you don’t agree with my argument above, I’m hoping some pieces of my code are inspirational to other projects.
Windows controls
Arrow keys left and right to rotate the camera’s view left and right.
Arrow keys up and down to move the camera forward and backwards.
Page up and Page down to rotate the camera’s view up and down.
A to change the rendering technique to static buffers.
S to change the rendering technique to mesh instancing.
D to change the rendering technique to naive rendering of each cube individually (don’t recommend this).
ESC to quit.
XBox controls
Right thumb stick to rotate the camera view.
Left thumb stick to move the camera.
X button to change the rendering technique to static buffers.
B button to change the rendering technique to mesh instancing.
Y button to change the rendering technique to naive rendering of each cube individually (don’t recommend this).
Back button to quit.
Known issues
There is a hiccup on windows when it shifts and regenerates new terrains. I haven’t done any indepth profiling to determine why. My initial guess is the that the multi-threading isn’t properly assigning the terrain generation code to another core. This does not occur on the XBox and it runs smoothly no matter how far you run.
Regenerating the terrain is very basic and does not remember any states. It just blindly fills in null terrain patches with random data. This means that going back to an old area will regenerate as a new random patch. Also, crossing a second boundary before it has finished working on a previous boundary crossing (possible when crossing at corners or when jumping back and forth on a boundary line) means that the order of filling in patches may be wrong, but they will all get filled in eventually.
Sidus Bellum was originally created as a school project to meet a two semester requirement for graduation. This week, we had to present our last milestone that will be due for the school project.
What does that mean? Well, it means that we are no longer obligated to work on Sidus Bellum. What it doesn’t mean is that Sidus Bellum is done and over with. We would still like to finish the game and release it for XBox Indie Games. However, development will now be a hobby and not a school assignment. For us, this is good news. We enjoy working on the game, but the strict milestone scheduling has been tough on our motivation. Now, we will be able to work on the features that we want to work on. And when we come across issues, we can handle them correctly, even if it means redoing entire portions of the code. Previously, we would hack something together to get it working in time for the deadlines.
Anyways, here are the updates that came with this milestone:
We have some game play elements added to the game. Objects have things like hitpoints and energy values and some of their actions are governed by these values. Objects are no longer “one shot, one kill.” Because of that, we also had to in better collision response, so that objects that run into other objects, but aren’t destroyed, will bounce off in a more realistic manner.
We have more than one projectile model. There is an improved version of the laser model that we have been using the entire time; a ball type projectile model was added; and our personal favorite, the lightning shuriken was added. The shurikens are hard to see, at the moment, due to them not having a glow effect like the other projectiles. I plan to add a glow effect to them later, which will look more akin to lightning.
A few new level types were implemented for the game. We have quite a few levels designed, but not all the functionality is in the game to make them a reality. There should be a good variety of game play options for the average player, though.
Some data saving and loading code has been added to allow users to change settings in the game and have the game remember them the next time they play. I wish that I had implemented this a while ago, since I am the only member of our team that likes to have the y-axis control inverted.
Finally, we had to create a teaser video to show off our game similar to how it might be advertised in a commercial. So, for eye candy this time, I have two videos for everyone. They are the same video, just with different sound tracks. They don’t show off anything new from the previously shown videos, except that ships will live longer and the UI is a bit more active, since we are now tracking hitpoints and the like.
So, for this milestone, I have been basically refactoring all of the game’s code in order to get nowhere…
First is the XML content control. We wanted the ability to design levels with XML scripts, instead of hard coding the levels into the game code. So, I’ve been working hard on abstracting all of the elements of every game object and combining them into a single object definition that can be filled in with serialized XML data. It is nearly complete, but enough is working to move on at this point.
Next was a complete overhaul of the collision detection system. The previous loose octree structure that we were using was starting to become a performance drain when large scale collision sweeps were needed (like the ones that I needed to do for the AI). The tree was just being made too deep and the game would spend too much time traversing the tree instead of performing checks. So, I tried implementing a few different systems. I have finally settled on a grid system that seems to be working well enough.
Our artist did get some new textures done and I did have some time to add some specular highlighting to the lighting model… so of course, we have a new eye candy video. Enjoy:
Sidus Bellum is a 3D space shooter for the XBox 360. It is being developed by a team of 4 with Microsoft’s XNA Game Studio. We have been developing the game as a school project to meet a 2 semester requirement for our Bachelor’s degree. Now that we are in our 2nd semester of working on the game, we have something that looks decent enough to start showcasing. So, I decided to start this blog about the project. This is our first public video of Sidus Bellum: