3D Gaussian Splatting and Diffusion Guided Optimization

16-825 Learning for 3D: Assignment 4

Author

Ricky Yuan (rickyy)

Published

November 6, 2025

1. 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 points)

Run the unit test to verify the implementation:

python unit_test_gaussians.py

Run the rendering code to visualize the pre-trained 3D Gaussians:

python render.py

Below is the result gif.

1.2 Training 3D Gaussian Representations (15 points)

Run the training code to generate the training progress and final renderings:

python train.py

Below are the learning rates that I used for each parameter, I trained the model for 1000 iterations:

Parameter	Opacities	Scales	Colours	Means
Learning Rate	0.01	0.001	0.01	0.001

The PSNR and SSIM obtained on held out views are as follows:

PSNR: 29.754
SSIM: 0.940

1.3 Extensions (Choose at least one! More than one is extra credit)

1.3.1 Rendering Using Spherical Harmonics (10 Points)

I refer to function get_color in this implementation and create a vectorized version.

Run the rendering code with the modified version to visualize with spherical harmonics:

python render.py

Comparison

Below are side by side comparisons of the renderings obtained from both the cases (original colour vs spherical harmonics based colour).

3D Gaussian Splatting Visualization without Spherical Harmonics

3D Gaussian Splatting Visualization with Spherical Harmonics

For the view 1, you can see that the yellow part of the chair becomes more goldenish when rendered with spherical harmonics, which reflects the lighting and material better. For view 2 and 3, you can see that the lighting and shadows are better represented.

1.3.2 Training On a Harder Scene (10 Points)

Not implemented.

2. Diffusion-guided Optimization

2.1 SDS Loss + Image Optimization (20 points)

Command to run the image optimization with SDS loss:

python Q21_image_optimization.py --prompt "a hamburger" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "a hamburger" --sds_guidance 1 --postfix "_with_guidance"
python Q21_image_optimization.py --prompt "a standing corgi dog" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "a standing corgi dog" --sds_guidance 1 --postfix "_with_guidance"
python Q21_image_optimization.py --prompt "uchiha itachi" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "uchiha itachi" --sds_guidance 1 --postfix "_with_guidance"
python Q21_image_optimization.py --prompt "gold bars and cash" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "gold bars and cash" --sds_guidance 1 --postfix "_with_guidance"

For each prompt, I visualized the output image after 700 iterations of optimization.

Prompt	Without Guidance	With Guidance
`a hamburger`
`a standing corgi dog`
`uchiha itachi`
`gold bars and cash`

2.2 Texture Map Optimization for Mesh (15 points)

Command to run the mesh texture optimization with SDS loss:

python Q22_mesh_optimization.py
python Q22_mesh_optimization.py --prompt "a wooden cow sculpture"
python Q22_mesh_optimization.py --prompt "a granite stone cow"

2.3 NeRF Optimization (15 points)

Command to run the NeRF optimization with SDS loss:

python Q23_nerf_optimization.py --prompt "a standing corgi dog" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2
python Q23_nerf_optimization.py --prompt "a LEGO figure" --lambda_entropy 0.01 --lambda_orient 0.05 --latent_iter_ratio=0.2
python Q23_nerf_optimization.py --prompt "a Japanese style bonsai" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2

Prompt	RGB	Depth
`a standing corgi dog`
`a LEGO figure`
`a Japanese style bonsai`

2.4 Extensions (Choose at least one! More than one is extra credit)

2.4.1 View-dependent text embedding (10 points)

Commands to run the NeRF optimization with view-dependent text embedding:

python Q23_nerf_optimization.py --prompt "a standing corgi dog" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2 --view_dep_text 1
python Q23_nerf_optimization.py --prompt "a Japanese style bonsai" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2 --view_dep_text 1

Prompt	RGB	Depth
`a standing corgi dog`
`a Japanese style bonsai`

From the results, we can see that adding the view dependent text embedding helps to improve the 3D consistency of the generated objects. For example, in the case of “a standing corgi dog”, the dog appears more coherent and realistic from different angles compared to the version with no view dependent training, which shows two noses from the side views. Similarly, for “a Japanese style bonsai”, the tree structure look more natural and consistent.

2.4.2 Other 3D representation (10 points)

Not implemented.

2.4.3 Variation of implementation of SDS loss (10 points)

Not implemented.