3D Gaussian Splatting and Diffusion Guided Optimization
16-825 Learning for 3D: Assignment 4
1. 3D Gaussian Splatting
1.1 3D Gaussian Rasterization (35 points)
Run the unit test to verify the implementation:
python unit_test_gaussians.py Run the rendering code to visualize the pre-trained 3D Gaussians:
python render.pyBelow is the result gif.
1.2 Training 3D Gaussian Representations (15 points)
Run the training code to generate the training progress and final renderings:
python train.pyBelow are the learning rates that I used for each parameter, I trained the model for 1000 iterations:
| Parameter | Opacities | Scales | Colours | Means |
|---|---|---|---|---|
| Learning Rate | 0.01 | 0.001 | 0.01 | 0.001 |
The PSNR and SSIM obtained on held out views are as follows:
- PSNR: 29.754
- SSIM: 0.940
1.3 Extensions (Choose at least one! More than one is extra credit)
1.3.1 Rendering Using Spherical Harmonics (10 Points)
I refer to function get_color in this implementation and create a vectorized version.
Run the rendering code with the modified version to visualize with spherical harmonics:
python render.pyComparison
Below are side by side comparisons of the renderings obtained from both the cases (original colour vs spherical harmonics based colour).
For the view 1, you can see that the yellow part of the chair becomes more goldenish when rendered with spherical harmonics, which reflects the lighting and material better. For view 2 and 3, you can see that the lighting and shadows are better represented.
1.3.2 Training On a Harder Scene (10 Points)
Not implemented.
2. Diffusion-guided Optimization
2.1 SDS Loss + Image Optimization (20 points)
Command to run the image optimization with SDS loss:
python Q21_image_optimization.py --prompt "a hamburger" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "a hamburger" --sds_guidance 1 --postfix "_with_guidance"
python Q21_image_optimization.py --prompt "a standing corgi dog" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "a standing corgi dog" --sds_guidance 1 --postfix "_with_guidance"
python Q21_image_optimization.py --prompt "uchiha itachi" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "uchiha itachi" --sds_guidance 1 --postfix "_with_guidance"
python Q21_image_optimization.py --prompt "gold bars and cash" --sds_guidance 0 --postfix "_no_guidance"
python Q21_image_optimization.py --prompt "gold bars and cash" --sds_guidance 1 --postfix "_with_guidance"For each prompt, I visualized the output image after 700 iterations of optimization.
| Prompt | Without Guidance | With Guidance |
|---|---|---|
a hamburger
|
|
|
a standing corgi dog
|
|
|
uchiha itachi
|
|
|
gold bars and cash
|
|
|
2.2 Texture Map Optimization for Mesh (15 points)
Command to run the mesh texture optimization with SDS loss:
python Q22_mesh_optimization.py
python Q22_mesh_optimization.py --prompt "a wooden cow sculpture"
python Q22_mesh_optimization.py --prompt "a granite stone cow"
a hamburger
a wooden cow sculpture
a granite stone cow
2.3 NeRF Optimization (15 points)
Command to run the NeRF optimization with SDS loss:
python Q23_nerf_optimization.py --prompt "a standing corgi dog" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2
python Q23_nerf_optimization.py --prompt "a LEGO figure" --lambda_entropy 0.01 --lambda_orient 0.05 --latent_iter_ratio=0.2
python Q23_nerf_optimization.py --prompt "a Japanese style bonsai" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2| Prompt | RGB | Depth |
|---|---|---|
a standing corgi dog
|
|
|
a LEGO figure
|
|
|
a Japanese style bonsai
|
|
|
2.4 Extensions (Choose at least one! More than one is extra credit)
2.4.1 View-dependent text embedding (10 points)
Commands to run the NeRF optimization with view-dependent text embedding:
python Q23_nerf_optimization.py --prompt "a standing corgi dog" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2 --view_dep_text 1
python Q23_nerf_optimization.py --prompt "a Japanese style bonsai" --lambda_entropy 0.001 --lambda_orient 0.01 --latent_iter_ratio=0.2 --view_dep_text 1| Prompt | RGB | Depth |
|---|---|---|
a standing corgi dog
|
|
|
a Japanese style bonsai
|
|
|
From the results, we can see that adding the view dependent text embedding helps to improve the 3D consistency of the generated objects. For example, in the case of “a standing corgi dog”, the dog appears more coherent and realistic from different angles compared to the version with no view dependent training, which shows two noses from the side views. Similarly, for “a Japanese style bonsai”, the tree structure look more natural and consistent.
2.4.2 Other 3D representation (10 points)
Not implemented.
2.4.3 Variation of implementation of SDS loss (10 points)
Not implemented.