Skip to main content

Report Generation

GPUFlight can generate a text-based performance report after a profiling session. The report summarizes kernel execution, memory transfers, system metrics, scope timing, and profile analysis.

C++ API

After calling shutdown(), call generateReport():

gpufl::shutdown();

// Print to console
gpufl::generateReport();

// Save to file
gpufl::generateReport("report.txt");

The report automatically reads the log files from the most recent init() call and filters to the latest session.

Python API

Quick One-Liner

from gpufl.report import generate_report

text = generate_report("./logs", log_prefix="my_app")
print(text)

Using the TextReport Class

from gpufl.report import TextReport
from gpufl.analyzer import GpuFlightSession

session = GpuFlightSession("./logs", log_prefix="my_app")
report = TextReport(session, top_n=10)
report.print() # print to stdout
report.save("report.txt") # save to file

Report Sections

A typical report includes:

Session Summary

Application name, session ID, duration, GPU device info (name, compute capability, SM/CU count).

Kernel Execution Summary

Total kernels, unique kernels, total GPU time, GPU busy percentage, duration statistics (avg/median/min/max).

Top Kernels by GPU Time

Ranked table showing each kernel's call count, total time, average time, and max time.

Kernel Details

Per-kernel occupancy breakdown: grid/block dimensions, register/shared memory/warp occupancy, limiting resource.

Memory Transfer Summary

Transfers grouped by direction (HtoD, DtoH), with byte counts and average throughput in GB/s.

System Metrics

  • GPU: Utilization, temperature, power, VRAM, clock speeds
  • AMD extended: Junction/memory temp, fan speed, voltage, energy, PCIe bandwidth, ECC errors
  • Host: CPU utilization, RAM usage

Scope Summary

Scope timing (begin/end pairs) and GPU time attribution per scope.

Profile / SASS Analysis

Stall reason distribution, per-kernel stall breakdown, SASS/ISA metric totals, and thread divergence analysis (warp/wavefront efficiency).

Example Output

===============================================================================
GPU Flight Session Report
===============================================================================

===============================================================================
Session Summary
===============================================================================
Application: my_app
Session ID: abc12345-...
Duration: 2.34 s
GPU Device: NVIDIA GeForce RTX 3090
Compute: 8.6
SMs: 82

===============================================================================
Kernel Execution Summary
===============================================================================
Total Kernels: 42
Unique Kernels: 3
Total GPU Time: 1.23 s
GPU Busy: 52.6%

===============================================================================
Top 10 Kernels by Total GPU Time
===============================================================================
# Kernel Calls Total Avg Max
--------------------------------------------------------------------------
1 matmul_kernel 21 845.23 ms 40.25 ms 156.42 ms
2 relu_kernel 21 384.91 ms 18.33 ms 42.10 ms