Rendering 2000 enemies at 50 fps
Rendering 4 different types of enemies, at a count of 2000, at 50 fps
In retrospect this seems extremely obvious, but I'll walk through the steps.
For starters, when I started this game my engine was set to partial animations for everything. That's because the game I made the Eight Winds Engine for, Eight Winds, needs the player character to be able to play 3 animations asynchronously without any interdependence. I had had this system for so long, nearly 9 months, that I had forgotten it was even there. Even though replacing this system was fairly easy, it led me down a route of optimizations that I might not have otherwise gone down.
Rendering Optimization
I started with instancing, which was fairly easy.
Before instancing, even at 100 monsters I was dipping below 144 rendering frames per second.
Without instancing, I use the push constant to push a model matrix to the GPU per monster.
I also used the push constant to tell the GPU what the beginning bone offset was. Each monster type had a unique bone count, so when I was binding the buffers I tracked how many bones I had already binded, then pushed that along with the modelmatrix to get the correct bone set.
For example, if I had 2 deers and a lich, the first deer would have a bone offset of 0, then the second deer would have a bone offset of the deer bone count, then the lich would have a bone offset of 2 deer bone counts. If I added a devil behind that, the devil would have the last bone offset + lich bone count.
The function was a bit like this, where monster was a base class that deers, liches, and devils could derive from
for(auto& monster : monsterContainer){
monster.boneOffset = boneOffset;
boneOffset += monster.getBoneCount();
}
So moving on to instancing
Keeping track of the bone count like that, and using a push constant for each monster was taking up quite a bit of time, so I decided to write a vertex shader for each monster and separate each monster's bone matrix buffer.
With a shader for each unique bone count, instead of calculating bone offsets and pairing it with the model matrix, I could just write model matrices, and use the instanceindex in the vertex shader to keep track of my bone offset and find my model matrix in a larger buffer.
Here's a before and after with the deer vertex shader code (GLSL version 450) - https://pastebin.com/93jcfYCN
This brought my performance up from dropping below 144 rendering frames per second at 100 monsters, to nearly 200 monsters.
At 900 monsters this brought my performance up from 8 fps to 12.
Still not quite what I've been advertising, well the final issue was with partial animations. My partial animation system goes through each active animation, and each active animation only affects the bones that it changes frame to frame. So basically, I would copy the default bone pose, then iterate over each active animation bone and copy it to the final pose, which was then copied to a larger buffer that was written to the GPU.
I changed it so that file format recorded ALL bones for an animation, and then the full animation system would just copy over the whole animation to the large buffer that was written to the GPU. I immediately saw an improvement to 120 fps at 1000 monsters. And this was all in debug mode, once I swapped to Release mode it was a full 144 at 1000 monsters, and 50 fps at 2000.
I had a handful of intermediate optimization steps, but that's the beginning and end. If I forced every monster to have an identical bone count at 15-30 bones I could merge all the vertex shaders into one and probably see a further performance increase to allow 60 fps at 2000 monsters.
Currently the game doesn't have collision, mostly because I'm running the logic steps 250 times per second. That's what Eight Winds uses, therefore the engine, and I didn't feel like changing all my tools and animations to match 60 merely for a game jam. If I slowed down the logic steps to a more reasonable 60 ticks per second, I could add collision and the game might still run faster.
Leave a comment
Log in with itch.io to leave a comment.