Fused Collapsing for Wide BVH Construction
Wilhem Barbier
Mathias Paulin

IRIT, Université de Toulouse, CNRS
Real-time ray tracing

Direct lighting
Reflections
Path tracing

Needs an acceleration structure

Acceleration structures

Goal: efficiently find intersections between a ray and a set of triangles

⤷ Wide bounding volume hierarchy (BVH)


Acceleration structures

Goal: efficiently find intersections between a ray and a set of triangles

⤷ Wide bounding volume hierarchy (BVH)

Architecture Structure
Hardware RT AMD RDNA3 4-wide BVH
AMD RDNA4 8-wide BVH
NVIDIA Lovelace ??-wide BVH
Intel Xe-HPG 6-wide BVH
Software RT CWBVH 8-wide BVH
Motivation

Challenge: Building these acceleration structures efficiently on a GPU

Motivation

Challenge: Building these acceleration structures efficiently on a GPU

Use case: complex dynamic scenes

Changing topology and significant deformations require a full rebuild

Very expensive!

Motivation

Challenge: Building these acceleration structures efficiently on a GPU

Use case: complex dynamic scenes

Changing topology and significant deformations require a full rebuild

Very expensive!

Our focus: reducing build time, at the cost of some tracing speed

Worth it when build cost is a bottleneck

BVH construction on the GPU

Challenge: Building these acceleration structures efficiently on a GPU

BVH construction on the GPU

Challenge: Building these acceleration structures efficiently on a GPU


Binary BVH builders only
BVH construction on the GPU

Challenge: Building these acceleration structures efficiently on a GPU


Binary builder + separate collapsing
Overview


Overview

Overview


Overview


No additional traversal!

Maximize build speed, at the cost of some tracing performance.

Main contributions

Main contributions

Bottom-up collapsing

Traverse a binary BVH bottom-up, output a wide BVH.

Bottom-up collapsing

Traverse a binary BVH bottom-up, output a wide BVH.

Bottom-up collapsing

Traverse a binary BVH bottom-up, output a wide BVH.

Bottom-up collapsing

Traverse a binary BVH bottom-up, output a wide BVH.

Bottom-up collapsing

Traverse a binary BVH bottom-up, output a wide BVH.

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

PLOC algorithm [Meister17]

Fusing with PLOC

We integrate our bottom-up collapsing within the PLOC loop.

Fusing with PLOC

We integrate our bottom-up collapsing within the PLOC loop.


Fusing with PLOC

We integrate our bottom-up collapsing within the PLOC loop.


Fusing with PLOC

We integrate our bottom-up collapsing within the PLOC loop.


Fusing with PLOC

We integrate our bottom-up collapsing within the PLOC loop.


Fusing with PLOC

We integrate our bottom-up collapsing within the PLOC loop.


Tree quality

Merge penalty

If two clusters have different number of references: penalize their distance


Merge penalty

Results: experimental setting

Compared algorithms:

All measurements done on a NVIDIA RTX 3090

More results in the paper!

Test scenes

Results: timings

Build time: consistent reduction across all scenes \( (\times 0.65\text{--}0.71) \)


Results: timings

Build time: consistent reduction across all scenes \( (\times 0.65\text{--}0.71) \)

Combined time: overall reduction when build time dominates \( (\times 0.74\text{--}0.81)\)

Results: timings

Build time: consistent reduction across all scenes \( (\times 0.65\text{--}0.71) \)

Combined time: overall reduction when build time dominates \( (\times 0.74\text{--}0.81)\)

Limitation: lower tree quality leads to higher trace time \( (\times 1.11\text{--}1.37) \)

Conclusion

Building wide BVHs is an understudied problem.

Conclusion

Building wide BVHs is an understudied problem.

We propose a construction algorithm that focuses on maximizing build speed.

Conclusion

Building wide BVHs is an understudied problem.

We propose a construction algorithm that focuses on maximizing build speed.

There is room for new approaches!

Conclusion

Building wide BVHs is an understudied problem.

We propose a construction algorithm that focuses on maximizing build speed.

There is room for new approaches!


Questions?

Partial reordering

Our construction procedure results in non-deterministic node ordering

Ordering matters for cache locality during tracing (node size < cache line)

Partial breadth-first reordering: top few levels only

Very fast (~1% of build time)

-->