Neural Volume Rendering and Surface Rendering

16-825 Learning for 3D: Assignment 3

Author

Ricky Yuan (rickyy)

Published

October 23, 2025

A. Neural Volume Rendering (80 points)

0. Transmittance Calculation (10 points)

Transmittance Calculation

Note that we have:

\[ T(\textbf{x}, \textbf{x}_{t_i}) = T(\textbf{x}, \textbf{x}_{t_{i-1}}) \cdot e^{-\sigma_{t_{i-1}} \Delta t_{i-1}} \]

  1. T(y_1, y_2):

\[ T(y_1, y_2) = e^{-2} = 0.135 \]

  1. T(x_2, y_4):

\[ T(y_2, y_4) = T(y_2, y_3) \cdot T(y_3, y_4) = e^{-0.5} \cdot e^{-30} = 5.676 \times 10^{-14} \]

  1. T(x, y_4):

\[ T(x, y_4) = T(x, y_1) \cdot T(y_1, y_2) \cdot T(y_2, y_3) \cdot T(y_3, y_4) = e^{0} \cdot e^{-2} \cdot e^{-0.5} \cdot e^{-30} = 7.681 \times 10^{-15} \]

  1. T(x, y_3):

\[ T(x, y_3) = T(x, y_1) \cdot T(y_1, y_2) \cdot T(y_2, y_3) = e^{0} \cdot e^{-2} \cdot e^{-0.5} = 0.082 \]

1. Differentiable Volume Rendering

1.3. Ray sampling (5 points)

Visualization

Run for visualization:

# mkdir images (uncomment when running for the first time)
python volume_rendering_main.py --config-name=box
XY Grid
Ray Bundle

1.4. Point sampling (5 points)

Visualization

Run for visualization:

python volume_rendering_main.py --config-name=box
Sample points

1.5. Volume rendering (20 points)

Visualization

Color Rendering
Depth

2. Optimizing a basic implicit volume

2.1. Random ray sampling (5 points)

Implemented get_random_pixels_from_image in ray_utils.py.

xy_grid = get_random_pixels_from_image(cfg.training.batch_size, image_size, camera)

2.2. Loss and training (5 points)

Run the training with:

python volume_rendering_main.py --config-name=train_box

Below are the center of the box, and the side lengths of the box after training, rounded to the nearest 1/100 decimal place.

Box center: (0.25, 0.25, -0.00)
Box side lengths: (2.01, 1.50, 1.50)

2.3. Visualization

Result Visualization from TA’s images
My Result Visualization

3. Optimizing a Neural Radiance Field (NeRF) (20 points)

Implementation

Note:

  1. Use the ReLU activation for the density layer to ensure non-negative values.
  2. Use the Sigmoid activation for the color layer.
  3. Use HarmonicEmbedding for both xyz and direction inputs for positional encoding.
  4. Use MLPWithInputSkips class as the backbone mlp for generating spatial features.

Visualization

Run the training with:

python volume_rendering_main.py --config-name=nerf_lego
python volume_rendering_main.py --config-name=nerf_lego_highres
NeRF Lego High Resolution Visualization

4. NeRF Extras (CHOOSE ONE! More than one is extra credit)

4.1 View Dependence (10 points)

I modified the the color layer of the NeRF MLP to take in both 3D position and viewing direction as input, and feed the concatenation of features and embedded viewing direction to the color layer in the forward function.

To run the training, first download the nerf materials dataset and store it in the data/ folder, and run:

python volume_rendering_main.py --config-name=nerf_materials_highres

Visualization

NeRF Materials High Resolution Trained from View Dependence Model

The result shows that the model is decently able to capture view-dependent effects such as specular highlights on the surface of the objects.

Trade-offs between increased view dependence and generalization quality

Incorporating view dependence into the NeRF model allows it to capture complex lighting effects such as specular highlights and reflections, which are essential for shiny or reflective surfaces. However, this will also increase the complexity of the model, requiring more parameters and potentially leading to overfitting.

4.2 Coarse/Fine Sampling (10 points)

NeRF employs two networks: a coarse network and a fine network. During the coarse pass, it uses the coarse network to get an estimate of geometry, and during fine pass uses these geometry estimates for better point sampling for the fine network. Implement this strategy and discuss trade-offs (speed / quality).

Visualization

Not implemented.

trade-offs (speed / quality)

Not implemented.

B. Neural Surface Rendering (50 points)

5. Sphere Tracing (10 points)

Implementation

I implemented the sphere tracing algorithm referring to the lecture slides. It initializes the mask and iteratively updates the 3D points along each ray direction to find the intersection of the surfaces. In each iteration, we check if the signed distance is sufficiently small to consider it an intersection, and update the mask accordingly. The process continues until all rays have intersected or the maximum number of iterations is reached.

Below is my implementation of sphere tracing in renderer.py:

def sphere_tracing(self, implicit_fn, origins, directions):
    N_rays = origins.shape[0]
    points = origins.clone()  # (N_rays, 3)

    # Initialize mask
    mask = torch.zeros((N_rays, 1), dtype=torch.bool, device=origins.device)

    for _ in range(self.max_iters):
        # Predict signed distance for each point
        sdf = implicit_fn(points)  # (N_rays,)

        # Update points
        points = points + sdf * directions

        # Update mask
        mask = mask | (sdf < 1e-6)
        if mask.all():
            break

    return points, mask

Visualization

Run the training with:

# mkdir images (uncomment when running for the first time)
python -m surface_rendering_main --config-name=torus_surface
Sphere Tracing Visualization

6. Optimizing a Neural SDF (15 points)

Implementation

Note:

  1. Optimization Goal: \(SDF(x) = 0\) for all points \(x\) in the point cloud.
  2. Eikonal Regularization Constraint: \(||\nabla SDF(x)|| = 1\) for all \(x \in \mathbb{R}^3\).
  3. Use HarmonicEmbedding for positional encoding of input 3D points.
  4. Use MLPWithInputSkips to define the MLP architecture similar to Part A.

Final Loss:

Point Loss: 0.001139
Eikonal Loss: 0.049197

Visualization

Run this to train the NeuralSurface representation:

python -m surface_rendering_main --config-name=points_surface
Input Point Cloud Visualization
Result of the Optimized NeuralSurface

7. VolSDF (15 points)

Implementation

Color Prediction

For the color prediction, similar to the implementation in NeuralRadianceField, I added two fully connnected layers to predict RGB color from the features extracted from the SDF MLP. While calling get_color or get_distance_color functions, I first get the features for input points from the SDF MLP, and then pass them through the color MLP to get the RGB color.

SDF to Density

From section 3.1 of the VolSDF Paper, the volumetric density is modeled as:

\[ \sigma(x) = \alpha \, \Psi_\beta(-d_\Omega(x)) \]

where \(d_\Omega(x)\) is the SDF, and the Laplace CDF is given by:

\[ \Psi_\beta(s) = \begin{cases} \dfrac{1}{2}\exp\!\left(\dfrac{s}{\beta}\right), & s \le 0 \\[8pt] 1 - \dfrac{1}{2}\exp\!\left(-\dfrac{s}{\beta}\right), & s > 0 \end{cases} \]

The implementation in the sdf_to_density function in renderer.py is as follows:

def sdf_to_density(signed_distance, alpha, beta):
    neg_sdf = -signed_distance
    density = torch.where(
        neg_sdf <= 0,
        0.5 * torch.exp(neg_sdf / beta),
        1.0 - 0.5 * torch.exp(-neg_sdf / beta)
    )
    density = alpha * density
    return density

Visualization

Run this to train an SDF on the lego bulldozer model:

python -m surface_rendering_main --config-name=volsdf_surface
Geometry Visualization without color
Color Visualization of the Lego Bulldozer

I chose alpha=10.0 and beta=0.025 for optimal results.

Discussion

  1. What does the parameters \(\alpha\) and \(\beta\) are doing here?

\(\alpha\) controls the overall scale of the density. Higher \(\alpha\) means higher opacity or thickness of the surface. \(\beta\) controls the sharpness of the transition from free space to the surface.

  1. How does high \(\beta\) bias your learned SDF? What about low \(\beta\) ?

A high \(\beta\) makes the SDF-to-density mapping smoother, so the surface becomes thick and blurry. A low \(\beta\) makes the mapping sharper, concentrating density tightly around the zero-level set and producing sharp edges and surfaces.

  1. Would an SDF be easier to train with volume rendering and low \(\beta\) or high \(\beta\) ? Why?

Training with high \(\beta\) is easier because the gradients are smoother and less likely to vanish or explode.

  1. Would you be more likely to learn an accurate surface with high \(\beta\) or low \(\beta\) ? Why?

An accurate surface is more likely to learn with low \(\beta\), since the density better approximates the true SDF boundary, though it may be harder to train.

8. Neural Surface Extras (CHOOSE ONE! More than one is extra credit)

8.1. Render a Large Scene with Sphere Tracing (10 points)

I created a new implicit function that combines a box frame with several spheres placed along its edges.

Visualization

Run the training with:

python -m surface_rendering_main --config-name=complex_boxframe_surface
Box Frame with Sphere Surface Visualization

8.2 Fewer Training Views (10 points)

Implementation

I set up a parameter num_train_views in the yaml file to control the number of training views. I also pass that config value into the dataset loader to select a random subset of training views by using the following code snippet in dataset.py:

# Q8.2: Subsample training views if num_train_views is specified
if num_train_views is not None and num_train_views < len(train_idx):
    print(f"Using {num_train_views} training views (out of {len(train_idx)} available)")
    # Randomly sample num_train_views indices from train_idx
    # Use a fixed random seed for reproducibility
    np.random.seed(42)
    sampled_indices = np.random.choice(len(train_idx), size=num_train_views, replace=False)
    train_idx = [train_idx[i] for i in sampled_indices]
else:
    print(f"Using all {len(train_idx)} training views")

Visualization

Commands:

python volume_rendering_main.py --config-name=nerf_lego_few_views # NeRF with 20 training views
python -m surface_rendering_main --config-name=volsdf_surface_few_views

Below are the comparisons of training VolSDF (geometry + color rendering) and NeRF models with full views and only 20 training views on the lego bulldozer scene.

VolSDF Geometry
20 Training Views
VolSDF Color Rendering
20 Training Views
NeRF Color Rendering
20 Training Views
VolSDF Geometry
Full Training Views
VolSDF Color Rendering
Full Training Views
NeRF Color Rendering
Full Training Views

Both VolSDF and NeRF show more blurry result when trained with only 20 views.

8.3 Alternate SDF to Density Conversions (10 points)

Implementation

I implemented the SDF to density conversion from the NeuS paper in addition to the VolSDF method.

The NeuS paper proposes the following SDF to density conversion: \[ \phi_\beta(x) = \frac{1}{\beta} \cdot \text{sigmoid}(-\frac{1}{\beta} \cdot x) \cdot (1 - \text{sigmoid}(-\frac{1}{\beta} \cdot x)) \]

Python code implementation:

def sdf_to_density_NeuS(signed_distance, alpha, beta):
    sigmoid_term = torch.sigmoid(-signed_distance / beta)
    density = alpha * (1.0 / beta) * (sigmoid_term * (1.0 - sigmoid_term))
    return density

Note that I adjusted the alpha and beta to 2.0 and 0.02 for NeuS to get better results.

Visualization

Commands:

python -m surface_rendering_main --config-name=neus_surface
VolSDF
NeuS

The two methods produce similar quality results, with NeuS having slightly sharper edges.