TriBench is a lightweight framework designed to simplify the evaluation of Triton kernels. It provides a standardized environment for testing correctness, profiling latency, and comparing different kernel optimizations (e.g., fused vs. sequential) with high reproducibility.
✨ Features
- Standardized Evaluation: Unified CLI to list, test, and benchmark all registered kernels.
- Fair Comparison: Compile and warmup phases are strictly isolated from the measurement phase to eliminate JIT and autotuning overhead.
- Kernel Composition: Easily bench and compare multiple kernel variants alongside your main implementation under identical conditions.
- Reproducibility: Environment capture and strict random seed control ensure stable benchmark numbers across different runs.
- Rich Metrics: Reports median (
p50) and tail (p95) latency, hardware utilization (TFLOPS), and memory bandwidth (GB/s).
🧠 Design Philosophy
TriBench adopts a decoupled, data-driven architecture. The core framework logic is separated from the kernel implementations, ensuring minimal overhead and maximum extensibility.
- Zero Framework Modification: Adding a new kernel requires zero changes to the core framework code. Simply drop a self-contained kernel directory into
kernels/. - Data-Driven Configuration: Each kernel defines its entrypoints, test shapes, and benchmark parameters locally via a
meta.jsonfile.
tribench/ # Core framework logic and CLI
kernels/ # Self-contained kernel implementations
├── rope/
│ ├── meta.json # Kernel configuration & test cases
│ ├── reference.py # PyTorch baseline
│ ├── triton_impl.py # Triton implementation
│ └── bench_entry.py # Custom benchmark harnesses (optional)
scripts/ # Automations and helper scripts
tests/ # Unit tests for the framework itself
🚀 Getting Started
Installation
git clone https://github.com/your-org/TriBench.git
cd TriBench
pip install -e ".[dev]"
Quick Commands
| Command | Description |
|---|---|
tribench list |
List all available kernels in the registry. |
tribench validate-meta |
Verify the structure and syntax of all meta.json files. |
tribench test --kernel all |
Run correctness checks against PyTorch references. |
tribench run --kernel all |
Execute benchmarks and report performance metrics. |
Example Benchmark Run:
tribench run --kernel rope --dtype fp16 --warmup-ms 200 --rep-ms 2000
🛠 Adding a Custom Kernel
Extending TriBench is straightforward. To add a new kernel, follow these steps:
- Create a new directory under
kernels/<your_kernel>/. - Define the exact shapes and entrypoints in
meta.json. - Implement the PyTorch reference inside
reference.py. - Implement the Triton logic inside
triton_impl.py. - Run
tribench test --kernel <your_kernel>to verify correctness.
To evaluate against another variant (like a sequential baseline), simply declare it under the variants block in meta.json:
"entrypoints": {
"triton": "fused.py:run",
"variants": {
"sequential": "baseline.py:run"
}
}
🙏 Acknowledgements
TriBench incorporates high-performance Triton kernels and ideas from several outstanding open-source projects: