Understand Ray Generation for NeRFs
Pinhole camera model and Ray Generation for Neural Radiance Fields
Equations in the Camera Coordinate System
A crucial element in rendering is the camera model and how rays are sampled and projected onto the scene from the camera. This tutorial will primarily focus on the pinhole camera model. Let us consider a camera resolution (H, W) and utilise the coordinates (u, v) to reference the pixels, with the origin located at the top left of the image (see below for an illustration). Another vital parameter of the camera is its focal length, enabling zooming in and out. Below, you’ll find the rendering of a cube in Blender, with various focal lengths.
The focal length f determines the distance between the image plane and the center of the camera coordinates. Consequently, the z-coordinate of each pixel is -f. Next, let us compute the x and y coordinates of each pixel (u, v). Since the origin of the x and y axes is at the center of the image (see below), and the y-axis direction is opposite to that of the v-axis, the x and y coordinates of pixel (u, v) are computed as (u — W / 2, -v +H / 2).
This can be easily put into code by creating a function that calculates these equations for each pixel in the camera (for a total of H * W pixels). In rendering, it is common practice to operate with normalised ray directions, and so it is important to remember to normalise the resulting direction vectors.
Now, we can test our code by setting up a scene with a basic red sphere, project our rays into the scene, and visualise the rendered image. The primary focus of this tutorial is on ray generation rather than rendering. However, if you’re keen on delving deeper into rendering fundamentals and exploring rendering with NeRFs, you might be interested into my course about NeRF.
Visualisations of the Rays
In addition to rendering, we can also visualize the generated rays. Since the rays are normalized, we can calculate the 3D location of a ray after a given time t using the ray equation r(t) = o + t * d. Our visualization function takes the ray attributes and a specified time t, and plots the rays after t seconds. It iterates over each ray, and plots the line between its origin and its location after t seconds.
Executing this code on a basic camera setup with a height of 20 pixels, a width of 20 pixels, and a focal length of 1200 mm generates the figure below.
Camera Coordinate System to World Coordinate System
We’ve completed most of the necessary work, but there is still a crucial parameter missing. Currently, we assume that our camera is positioned at the center of the scene, facing in the opposite direction of the z-axis. However, we want the ability to place the camera anywhere in the scene and rotate it in any direction. To achieve this, we will add a camera-to-world (c2w) matrix as input to our function. This 4x4 matrix encodes both the location of the origins of the rays and the 3x3 rotation matrix to orient the rays from the camera coordinate system to the world coordinate system. Once the ray directions are generated in the camera coordinate system, we can simply apply the rotation to transform their coordinates into the world coordinate system.
We can verify our code by creating a slightly more complex scene consisiting of two spheres, and then rotate the camera around them.
I hope you found this story helpful! If it provided value to you, please consider showing your appreciation by clapping for this story. Don’t forget to subscribe to stay updated on more tutorials and content related to Neural Radiance Fields and Machine Learning.
Your support is greatly appreciated, and it motivates me to create more useful and informative material. Thank you!
[Udemy Course] | [NeRF Consulting] | [Career & Internships]