Report Generation

GPUFlight can generate a text-based performance report after a profiling session. The report summarizes kernel execution, memory transfers, system metrics, scope timing, and profile analysis.

C++ API

After calling shutdown(), call generateReport():

gpufl::shutdown();

// Print to console
gpufl::generateReport();

// Save to file
gpufl::generateReport("report.txt");

The report automatically reads the log files from the most recent init() call and filters to the latest session.

Python API

Quick One-Liner

from gpufl.report import generate_report

text = generate_report("./logs", log_prefix="my_app")
print(text)

Using the TextReport Class

from gpufl.report import TextReport
from gpufl.analyzer import GpuFlightSession

session = GpuFlightSession("./logs", log_prefix="my_app")
report = TextReport(session, top_n=10)
report.print()         # print to stdout
report.save("report.txt")  # save to file

Report Sections

A typical report includes:

Session Summary

Application name, session ID, duration, GPU device info (name, compute capability, SM/CU count).

Kernel Execution Summary

Total kernels, unique kernels, total GPU time, GPU busy percentage, duration statistics (avg/median/min/max).

Top Kernels by GPU Time

Ranked table showing each kernel's call count, total time, average time, and max time.

Kernel Details

Per-kernel occupancy breakdown: grid/block dimensions, register/shared memory/warp occupancy, limiting resource.

Memory Transfer Summary

Transfers grouped by direction (HtoD, DtoH), with byte counts and average throughput in GB/s.

System Metrics

GPU: Utilization, temperature, power, VRAM, clock speeds
AMD extended: Junction/memory temp, fan speed, voltage, energy, PCIe bandwidth, ECC errors
Host: CPU utilization, RAM usage

Scope Summary

Scope timing (begin/end pairs) and GPU time attribution per scope.

Profile / SASS Analysis

Stall reason distribution, per-kernel stall breakdown, SASS/ISA metric totals, and thread divergence analysis (warp/wavefront efficiency).

Example Output

===============================================================================
                           GPU Flight Session Report
===============================================================================

===============================================================================
  Session Summary
===============================================================================
  Application:          my_app
  Session ID:           abc12345-...
  Duration:             2.34 s
  GPU Device:           NVIDIA GeForce RTX 3090
    Compute:            8.6
    SMs:                82

===============================================================================
  Kernel Execution Summary
===============================================================================
  Total Kernels:        42
  Unique Kernels:       3
  Total GPU Time:       1.23 s
  GPU Busy:             52.6%

===============================================================================
  Top 10 Kernels by Total GPU Time
===============================================================================
  #   Kernel                          Calls   Total       Avg         Max
  --------------------------------------------------------------------------
  1   matmul_kernel                      21   845.23 ms   40.25 ms   156.42 ms
  2   relu_kernel                        21   384.91 ms   18.33 ms    42.10 ms

C++ API​

Python API​

Quick One-Liner​

Using the TextReport Class​

Report Sections​

Session Summary​

Kernel Execution Summary​

Top Kernels by GPU Time​

Kernel Details​

Memory Transfer Summary​

System Metrics​

Scope Summary​

Profile / SASS Analysis​

Example Output​