Skip to content
TriBench Logo

TriBench

A Reproducible and Extensible Benchmark Suite for Triton Kernels.

Python Version Triton License


TriBench is a lightweight framework designed to simplify the evaluation of Triton kernels. It provides a standardized environment for testing correctness, profiling latency, and comparing different kernel optimizations (e.g., fused vs. sequential) with high reproducibility.

✨ Features

  • Standardized Evaluation: Unified CLI to list, test, and benchmark all registered kernels.
  • Fair Comparison: Compile and warmup phases are strictly isolated from the measurement phase to eliminate JIT and autotuning overhead.
  • Kernel Composition: Easily bench and compare multiple kernel variants alongside your main implementation under identical conditions.
  • Reproducibility: Environment capture and strict random seed control ensure stable benchmark numbers across different runs.
  • Rich Metrics: Reports median (p50) and tail (p95) latency, hardware utilization (TFLOPS), and memory bandwidth (GB/s).

🧠 Design Philosophy

TriBench adopts a decoupled, data-driven architecture. The core framework logic is separated from the kernel implementations, ensuring minimal overhead and maximum extensibility.

  • Zero Framework Modification: Adding a new kernel requires zero changes to the core framework code. Simply drop a self-contained kernel directory into kernels/.
  • Data-Driven Configuration: Each kernel defines its entrypoints, test shapes, and benchmark parameters locally via a meta.json file.
tribench/          # Core framework logic and CLI
kernels/           # Self-contained kernel implementations
  ├── rope/
  │   ├── meta.json         # Kernel configuration & test cases
  │   ├── reference.py      # PyTorch baseline
  │   ├── triton_impl.py    # Triton implementation
  │   └── bench_entry.py    # Custom benchmark harnesses (optional)
scripts/           # Automations and helper scripts
tests/             # Unit tests for the framework itself

🚀 Getting Started

Installation

git clone https://github.com/your-org/TriBench.git
cd TriBench
pip install -e ".[dev]"

Quick Commands

Command Description
tribench list List all available kernels in the registry.
tribench validate-meta Verify the structure and syntax of all meta.json files.
tribench test --kernel all Run correctness checks against PyTorch references.
tribench run --kernel all Execute benchmarks and report performance metrics.

Example Benchmark Run:

tribench run --kernel rope --dtype fp16 --warmup-ms 200 --rep-ms 2000

🛠 Adding a Custom Kernel

Extending TriBench is straightforward. To add a new kernel, follow these steps:

  1. Create a new directory under kernels/<your_kernel>/.
  2. Define the exact shapes and entrypoints in meta.json.
  3. Implement the PyTorch reference inside reference.py.
  4. Implement the Triton logic inside triton_impl.py.
  5. Run tribench test --kernel <your_kernel> to verify correctness.

To evaluate against another variant (like a sequential baseline), simply declare it under the variants block in meta.json:

"entrypoints": {
    "triton": "fused.py:run",
    "variants": { 
        "sequential": "baseline.py:run" 
    }
}

🙏 Acknowledgements

TriBench incorporates high-performance Triton kernels and ideas from several outstanding open-source projects:


Built with ❤️ by the TriBench Team