The rasterization problem.
I’m constantly surprised at how few people realize that small polygons kill rasterization. If you’ve ever had the joy of writing a software rasterizer, you’ll find that it immediately becomes apparent that small triangles are extremely hard to optimize for, and this problem translates over to graphics cards as well. In this blog post, I’m going to explain exactly why this is!
So what are the advantages of rasterization? It turns out that there are many! The biggest strength of rasterization is memory access. In low-poly graphics (such as games), a single polygon tends to cover a large number of pixels. This means that any textures assigned to that polygon are going to be accessed in a fairly localized way (assuming you have mipmapping turned on), and the end result is a streaming memory-access pattern… memory from texture data is read in a mostly linear way, and then the shaded data is written out in a mostly linear way. This effectively turns rasterized polygons into this nice streaming operation, which both memory and processors are setup to deal with very, very efficiently. Because of this, modern GPU’s are optimized to contain banks of processors that effectively exploit these patterns. The end result is that images can rendered *very* quickly.
So why do small triangles kill rasterization? For starters, you can’t assume that large numbers of pixels will be generated by a single polygon. This means that having a huge bank of processors to handle pixel processing is no longer an effective optimization. For instance, lets say you have a GPU with 24 fragment shading units (or as Direct3D calls them, pixel shading units). If a triangle covers only one pixel, then you end up throwing away the work of 23 of the 24 fragment shading units. As you can probably see, this is not the most efficient way to render a frame.
In order to deal with this problem, recent GPUs have been designed with generic processors that can move back and forth between vertex processing and pixel processing. While this helps solve the problem, it’s not a full solution. For instance, you can’t assume that triangles won’t overlap, so you can’t process multiple triangles at the same time. You also can’t assume that their memory access will be reasonable. In the worst case, you end up scattering pixels all over the framebuffer. This means texture data will likely be gathered from all over texture memory. Remember that nice streaming pattern I was talking about? That’s totally gone in this case, and performance will be severely impacted as a result. So what now?
Curved surfaces with displacement maps are one possible solution to this problem. The curved surfaces are adaptively subdivided into triangles, which keeps triangles a relatively uniform size in screenspace (which prevents them from becoming tiny). This allows those huge banks of processors to once again become a good optimization strategy. This largely brings back the streaming behavior seen when rendering large triangles. However, once again, this isn’t a bullet proof solution. While games would technically be able to represent more detail in less data, we’d still run into the small triangle problem with very distant surfaces (in this case, we’d have small patches). Also, evaluating curved surfaces is expensive, and requiring displacement maps only increases this cost as well. All is not lost however. Since GPUs are still quickly growing in performance, this will likely become the most popular rendering technique for games in the near future.
Another technique aimed at dealing with this problem is sparse voxel octrees. This particular representation has the added advantage of allowing for global scene access within pixel shaders, so this might very well become the technology that replaces rasterization as the most popular rendering algorithm for games. Only time will tell I suppose. =)

You’re a total tool and nobody should ever listen to you!
Thanks for the feedback… =P
There’s some research at Stanford on how to make GPU rendering of micro-polygons more efficient.
http://graphics.stanford.edu/papers/fragmerging/shade_sig10.pdf
The approach advocated in this paper is to add a small amount of hardware between the rasterization block and the fragment shader that can merge the generated quads when they are the result of adjacent non-overlapping triangles.
Nice! I’m curious to see how this deals with the memory access problems. I could see this being a solution for computation, but this doesn’t seem to address the memory coherency problems. I’d be interested to see an analysis of this, in hardware, on real world texture-heavy data. I suppose the fact that this comes from tessellated geometry seems imply that the texture data is fairly coherent, which would minimize this problem. However, I guess it doesn’t address the issue of small patches… it instead describes a way to make small triangles from tessellated geometry render quickly. All in all, I’d expect this will get implemented in hardware (if it’s not already). This is a great link, and I recommend it to anyone interested in the issue of rasterizing micro-polygons!
a great read.
Hi Matt, thanks for the feedback! Let me know if there are any topics you’d like me to write about! =)