Improving a sprite-based rendering procedure

Sat Jan 19, 2013

So you are developing a 2D game and, suddenly, you discover that the rendering procedure is slow. What would you do? Would you put less elements in your levels or would you render them more efficiently? Since you are here, I suppose that you would pick the latter. So, here we go, I am going to tell you how to render less things while having more.

The first part of this post describes how much work is done during the rendering of a scene. Then I will compare it with the results of an easy to implement improved procedure. And finally I will give you the pointers to an even better procedure. This final procedure has been used in Andy’s Super Great Park and in the —work in progress— Plee the Bear.

Background

In the first versions of Plee the Bear we were not really worried about the speed of the rendering procedure, nor the speed of any other procedure. Keeping in mind that premature optimization is the root of all evil, we had to make things work before making them working fast. That was some years ago. Then the game has grown, we began to put a lot of stuff in the levels and finally the time of thinking about accelerating things did come. That is the subject of today’s post: how the rendering procedure evolved with the growing of the game.

The initial procedure was as simple as possible. Elements are rendered from the background to the foreground, as is. Having something drawn on the screen was a sufficient result at this time.

So, what amount of work this procedure does? Let’s see how many times each pixel of the screen is written in a given scene. We will use the very beginning of the first act of the forest of Plee the Bear, just when the player can start to control Plee:

The moment of the game used for the experiments.

And we will apply the following color representation:

Here is the result for the aforementioned scene:

The number of times each pixel is written. The graph on the right
shows the percentage of the pixels for each number of passes (range is
(0, 0.5).

Not surprisingly, with three layers of rain plus the background, each pixel is written at least 4 times, most of them 5 or 6 times and some are written up to 9 times. And once the rain is gone, the range goes from 1 to 6 writings:

Number of writings once the rain is gone.

An interesting thing in these two pictures is that even parts hidden by the middle ground decorations are rendered.

Improving the rendering procedure

The improvement we wanted to introduce then was to avoid rendering elements that will be hidden by other elements. The idea is to maintain a representation of the empty parts of the screen whilst considering the elements from the foreground toward the background. For each element there are to steps. First, if the element intersects the empty parts of the screen, we split it into sub-elements that will cover only the empty parts of the screen. Then, if the initial element is opaque, we update the emptiness of the screen.

To keep things simple, we represent the parts of the screen with axis-aligned boxes. Elements are considered as opaque if there is no alpha transparency in the source image and if they are not rotated.

The different steps of the algorithm.

Let’s come back to the game to see how many times the pixels are written with this procedure. During the rain:

Number of writings per pixels once the rain is gone, with the
improved rendering procedure.

Pixels are written from 2 to 8 times. Contrary to the original procedure, some of them are drawn 2 or 3 times. The number of pixels drawn more than 3 times has been greatly reduced. And after the rain:

Number of writings per pixels once the rain is gone, with the improved
rendering procedure.

Here the range becomes 1-5 writings per pixels, most of them are written 1 or 2 times. Contrary to the original algorithm, we have more pixels written once than three times.

The benchmark

Finally, for all this work to be useful there must be an increase of the performance. That is: more frames rendered per second. To keep an uniform sequence of rendered items among the tests, we use a demo script that runs in the game. Here are the results:

Number of frames per seconds during the demo.

One can see that the new procedure greatly increases the number of frames per seconds, which is exactly what we wanted.

Can we have more?

Yes! we can do better. You may have noticed on the above captures that some parts of the screen seem to be written several times even if the foreground seems opaque. The main reason is that these foreground sprites have some transparent pixels on one of the edges of their box. Thus, the procedure does not consider any opaque box for them.

In order to improve this, we just have to compute some kind of opaque box inside each sprite. More precisely, we want the largest opaque box of each sprite. Is it easy to compute? Well, it may be easy, if you reformulate the problem as the largest rectangle with no transparent pixel. You now have an instance of the well known Largest Empty Rectangle problem for which you will find good resources, such as an article named Computing the Largest Empty Rectangle on One- and Two-Dimensional Processor Arrays by Frank Dehne.

Contrary to the previous procedure this one cannot be executed at run time (unless you accept the levels to be be loaded in several minutes). For our games, we managed to insert the procedure in the level editor, as an optimization step executed when the level is compiled. Then the game engine just have to read the computed opaque boxes and to apply them in the initial procedure.