TriBench.
The standard for Triton kernel benchmarking. Fast, reproducible, and beautifully extensible.
Zero Overhead
Compile and warmup phases are strictly isolated from the measurement phase to eliminate JIT overhead.
Reproducible
Environment capture and strict random seed control ensure stable benchmark numbers across different runs.
Composable
Easily benchmark and compare multiple kernel variants alongside your main implementation.