The back story
Last week I posted a first look video on Rayclops and received a lot of positive feedback. One of the most requested things to be added was sea life. This was on my list but just had not made it into the prototype yet.
A few days ago I started work on adding fish. At first you might think just throwing up a few low-poly models might be the easy way to go. But you’ll soon notice that the ocean is huge and a few fish just won’t cut it. Furthermore, having fish that swim in a straight line even to random points won’t do either. Fish, just like birds in a flock, often travel in shoals. I needed to come up with a way to reproduce this behaviour and do it in real time.
Flocks of BOIDs (Birds in New York accent)
Thankfully, a very smart chap named Craig Reynolds created an artificial life algorithm that does a very good job of this. Boids, bird-oid object, applies three basic rules to each agent. These are separation, alignment and cohesion. Using these rules you can produce a very believable flocking movement.
Let’s not reinvent the wheel
After a quick search I found more than a few boids examples on the net. However, most of them used physics/colliders on individual biods that clearly would not work for large numbers. I also wanted all the physics time for the actual submarine simulation.
Many of the other ones I found were rather poor with more than 200 fish agents. They would always cause the frame rate to tank. (Get it? Yes a bad pun)
Time to get my feet wet
So I needed my own plan on how to make this work. I wanted two or three groups of 100 to 200 fish at the start. This was not including the other random individual sea life around each level. The size of each fish would be smaller, but as a whole appear as a large shoal together.
I first thought that a particle system might work well by individually moving particles. I was able to get particles to move using GetParticles() and SetParticles(). However, trying to get the correct 3D rotation proved to be problematic.
Double sided what?
At this point I decided that using a custom double sided quad with flipped normals would be a best option. I did try a double sided shader, but I’ll get more into that in a minute. The fish only used a simple texture with an alpha map cut-off that I could swap out at ease.
I too, like many of the examples, started out by using a boid MonoBehaviour for each agent. Then then a controller object to spawn and keep track of them. This worked well enough at the start, but once I got to around 1,000 fish performance was bad again. (less than 35 FPS)
Profiler, when it worked
At this point I needed to find out why the code was slow. Enter the profiler, when it agreed to work for me. Several times it would hang or outright crash the editor. After a few attempts I got deep profiling working and soon found two problems.
The first was not a surprise as it’s a rather well known fix, Vector3.Distance(). I had two calls in the rules loop that were using around 47% of the time. Since the boid rules only need to compare distance, it was an easy replacement for Vector3.SqrMagnitude(). It does this by saving on the square root function. This change increased my frame rate by about 8 FPS.
The second culprit of my slower FPS was actually a big surprise. One rather odd entry called Object.op_Inequality(). I went back and looked at my code again… I had been using an array of my agents and checked inside my for loops (not foreach) that:
if (array[index] != this)
It was checking that “this” MonoBehaviour was not equal to the item in the array so it wouldn’t run on itself. Really Unity? That test was costing me that much time per iteration? The solution I came up with is to simply keep track of each agent’s integer index. And because I was already using dependency injection for the controller reference when spawning each agent, it was trivial to add an integer index given by the controller. The test then became:
If (i != myIndex)
By added a single four byte int instead of the object compare, I gained another 5-7 FPS. Two simple changes I had gained about 15FPS. Slowly getting better, but we need more power.
Time to get serious about larger numbers
With the changes now I could get 2,000 boids working at about 35 FPS. The method of boid rules I’m using is an O(n^2). Meaning that with 2,000 agents I need to iterate over each other once per for a complete update. For 2,000^2 that gives you 4,000,000 iterations to complete a full loop. I hadn’t planned on having 2,000 fish in any level at the start. However, it was still taking too much time even at smaller numbers.
Checking the profiler again showed that the next biggest user was getting agent.transform.position. I solved this by caching my position and rotation as public member variables. Again, not huge increase, but every little bit did add up.
Like as peas in a pod
All my fish in each shoal are using the same double quad mesh with the same material. This made them a great fit to try new support for GPU Instancing.
I followed the documentation to get a shader template that I modified to use alpha cut-off. This worked and gave me another 2-5 FPS. A bit disappointed as it was much less of a gain than I had hoped. Looking back at the profiler, it was still using too much CPU time on AI rules on each boid. But there wasn’t much left to optimize out of the loop at this point. At lower counts of less than 2,000, the game was CPU bound. Later when I was going over 5,000, the game became GPU bound and GPU instancing paid off. (15+ FPS increase) Anything below that it was actually faster to use the standard shaders for rendering I can only assume due to GPU overhead.
When Scotty can’t give you more warp speed
I knew the main thread was already running as fast as it could. Unity being single threaded had to share all the AI updates with the rest of the engine. The next solution was to try threading. Threading is a powerful and very complex subject with mutexs, race conditions and more headaches. Not to mention none of Unity’s API is thread safe. But before I could even begin this, I needed to redesign the agent structure. I couldn’t launch thousands of threads for each agent individually.
To do this I moved the agent’s rule processing to the controller. The controller launched a separate thread at the start that didn’t modify any Unity API methods directly. The agents would then poll the controller’s data on via their own Update() to get their new direction and speed. This was one of the biggest increases and made a huge difference.
And that’s a tuna wrap
Now with multi-threading, GPU instancing and better optimized code, I was able to run 10,000 fish to at 60 FPS. Remember that this is O(n^2), so a complete loop takes 100 million iterations to run for every fish. A few other minor additions of some random delays (thread sleep) to AI loop and a random chance to keep swimming strait helped as well. These also made the fish appear more organic in their movements. I think storing the boids in an R-Tree or similar spatial array would save even more time but I’ll leave that for a future update.
All these changes now gave me a solid frame rate with a crazy amount of fish. More realistically with a target of 800-1200 fish I was getting 600 FPS. As an added bonus the standalone stress test makes a very cool screensaver.
So what do the fish look like in game?
Here is a few examples of the fish in-game. I need to make a “sea life controller” that will spawn shoals around the player and remove those outside camera range. I also added a few tweaks like update rates so that as the player gets further away the update speeds decreases save processor time.
Other changes for this week
After adding the fish I spent some time reworking the light attenuation shader that makes water darker as it gets deep. This really improved the overall look of the ocean and improved visibility in the deep. Check out the video if you haven’t already.
That’s all for this update, thanks for reading this huge post. As always, your suggestions and feedback are welcome and very needed to improve the game. Please get in touch and tell me what you think.