This is my dissertation for MSc Games Programming at the University of Hull. Given a list of dissertations to pick from, despite a few being very interesting, none of the projects involved the PS3 tool which I found fascinating. Therefore, instead of picking one from the list, I came up with my own idea of making a ray tracer on the PS3. This was quite ambitious as I had never programmed on the PS3 before nor had I made a ray tracer before.
My main goal was to make the PS3 program fully parallel and use the entire cell broad band engine (CBE). If you are unfamiliar with the PS3's architecture a brief overview can be found here. If you would like more detail about this project feel free to contact me.
Here we have a Cornell box that is commonly used to demonstrate reflections and refractions. This scene runs at ~2.3 frames per second (FPS). The program features:
• Reflections (left most sphere).
• Refractions with Beer's law (large right sphere).
• Phong with Schlick's approximation (diffuse on the red near sphere, specular on large right sphere).
• Hard shadows.
• Aspect ratio correction.
• Sphere and plane primitives.
• BVH acceleration structure.
• Adaptive depth cut off.
• Full use of all the PS3's SPUs and PPU with load balancing.
• PS3 controller use enabling;
• Change ray depth cut off.
• Toggle reflections.
• Toggle refractions.
• Pitch and yaw the camera.
• Move the position of the camera.
This scene is made to demonstrate the colours being added together in reflections. The spheres are arranged in a six sided colour wheel. They gain colour(s) in the reflections of eachother. For example, a reflection of a white sphere on a red sphere would appear "pinky". This scene runs at ~2.9 FPS.
This scene also shows reflections, but its main purpose is to show it running at higher speeds. It runs at ~7.8 FPS. Considering the screen resolution is 1280x720, and the PS3 is clocked at 3.2 GHz, these frame rates are respectable.
BVHs are very complicated and time consuming to implement. It could have easily taken the whole duration of the project to make a particularly effective one. Therefore, my implementation was kept relatively simple. The image above shows the lowest layer of the BVH in action on the testing scene. The green spheres are the test primitives that have the bound volume created for them. The purple (actually blue, but red gets added to their colour) larger spheres around the green spheres are their bounding volumes at the lowest level of the BVH. Above them are the purple nodes bounding volumes which are the even larger red spheres. The turquoise sphere, in the centre of the image, is the calculated centre of the scene (average of all primitive positions) it is used as a reference for the top layer.
The top layer is created based on the centre of the scene as a reference to split the world into eight giant zones similar to the top layer of an Oct tree. If any nodes are present in that zone, then a bounding volume (node) is made for them, and they all become sub nodes to it. The yellow spheres in the image are these zones / top layer nodes. So to conclude, the BVH is created before ray tracing begins, and doesn't change as long as the scene is static. Then each ray is tested against the BVH. The tests would occur in this order: firstly the yellow spheres are tested against, next the red spheres, then the purple(ish) spheres and finally the primitives inside the purple spheres to get the collided object. This also works for rays coming from inside a bounding volume. Therefore, this method not only improves the primary rays but secondary rays such as reflections and refractions.
As outlined above a homogenous approch was most appropiate and was taken; each of the five SPE's available to the programmer has its own ray tracer. At the start of the program the BVH is constructed by the PPE which then passes the constructed BVH to each of the SPE's via a DMA transfer. Each SPE was running its ray tracer using a "task list" which is quite similar to a thread in that it can run for the entirety of the program and is suited to larger grain tasks. Since each SPE had its own ray tracer in a thread like object, manual synchronisation had to be implemented. This was the core point of the dissertation and great joy was taken implementing this part.
Here we have an image of the refraction scene with a grid (to scale) drawn on top. The number in the cells is that grid's ID number. The shared variable represents the current progress on the image. For example if the variable was twenty one, everything before it (twenty one) has been processed. Each SPE would check the current progress by accessing this variable (safely via libatomic) and then increment it by the number of cells it takes. The number of cells was changeable because each SPE should be busy for as long as possible and should transfer data infrequently as possible. On the other hand, each SPE has a small memory capacity and can only buffer so much output data for syncing with the image. Therefore I implemented it so that it was possible to change the number of blocks / cells (area of the image) each SPE takes, so that the optimum number of cells (area of image) per SPE could be found.
For example: the start of a frame would happen like so, SPE 1 would come along and see that nought is the next free cell. It will then change the current free cell to four. Next, SPE 2 will come along and see that cell number four is free and increment the current free cell to eight and so on until the image is finished. When a SPE finishes its cells it informs the PPE that it's done its job and copies its output buffers into a buffer on the PPE which then updates the final image. After that, it goes back for another task.