Needs an acceleration structure
Goal: efficiently find intersections between a ray and a set of triangles
⤷ Wide bounding volume hierarchy (BVH)
Goal: efficiently find intersections between a ray and a set of triangles
⤷ Wide bounding volume hierarchy (BVH)
Architecture | Structure | |
---|---|---|
Hardware RT | AMD RDNA3 | 4-wide BVH |
AMD RDNA4 | 8-wide BVH | |
NVIDIA Lovelace | ??-wide BVH | |
Intel Xe-HPG | 6-wide BVH | |
Software RT | CWBVH | 8-wide BVH |
Challenge: Building these acceleration structures efficiently on a GPU
Challenge: Building these acceleration structures efficiently on a GPU
Use case: complex dynamic scenes
Changing topology and significant deformations require a full rebuild
Very expensive!
Challenge: Building these acceleration structures efficiently on a GPU
Use case: complex dynamic scenes
Changing topology and significant deformations require a full rebuild
Very expensive!
Our focus: reducing build time, at the cost of some tracing speed
Worth it when build cost is a bottleneck
Challenge: Building these acceleration structures efficiently on a GPU
Challenge: Building these acceleration structures efficiently on a GPU
Challenge: Building these acceleration structures efficiently on a GPU
No additional traversal!
Maximize build speed, at the cost of some tracing performance.
Traverse a binary BVH bottom-up, output a wide BVH.
Traverse a binary BVH bottom-up, output a wide BVH.
Traverse a binary BVH bottom-up, output a wide BVH.
Traverse a binary BVH bottom-up, output a wide BVH.
Traverse a binary BVH bottom-up, output a wide BVH.
We integrate our bottom-up collapsing within the PLOC loop.
We integrate our bottom-up collapsing within the PLOC loop.
We integrate our bottom-up collapsing within the PLOC loop.
We integrate our bottom-up collapsing within the PLOC loop.
We integrate our bottom-up collapsing within the PLOC loop.
We integrate our bottom-up collapsing within the PLOC loop.
If two clusters have different number of references: penalize their distance
Compared algorithms:
All measurements done on a NVIDIA RTX 3090
More results in the paper!
Build time: consistent reduction across all scenes \( (\times 0.65\text{--}0.71) \)
Build time: consistent reduction across all scenes \( (\times 0.65\text{--}0.71) \)
Combined time: overall reduction when build time dominates \( (\times 0.74\text{--}0.81)\)
Build time: consistent reduction across all scenes \( (\times 0.65\text{--}0.71) \)
Combined time: overall reduction when build time dominates \( (\times 0.74\text{--}0.81)\)
Limitation: lower tree quality leads to higher trace time \( (\times 1.11\text{--}1.37) \)
Building wide BVHs is an understudied problem.
Building wide BVHs is an understudied problem.
We propose a construction algorithm that focuses on maximizing build speed.
Building wide BVHs is an understudied problem.
We propose a construction algorithm that focuses on maximizing build speed.
There is room for new approaches!
Building wide BVHs is an understudied problem.
We propose a construction algorithm that focuses on maximizing build speed.
There is room for new approaches!
Our construction procedure results in non-deterministic node ordering
Ordering matters for cache locality during tracing (node size < cache line)
Partial breadth-first reordering: top few levels only
Very fast (~1% of build time)