PolyUFC: Polyhedral Compilation Meets Roofline Analysis for Uncore Frequency Capping

Nilesh Rajendra Shah, M V V S Manoj Kumar, Dhairya Baxi, and Ramakrishna Upadrasta

Published in CGO’26

APPENDIX

In App. A, we show the variables used for mathematical modeling in Sec. 2. In App. B, we show a study comparing frequency capping and scaling. We also discuss the experiments that we performed to obtain the (performance and power) rooflines, followed by a pointer-chasing algorithm for miss penalty. ¹

Variables	Description
\(T^{\Omega}_{\mathcal{I}}\)	Total time taken for floating point operations
\(T^{Q}_{f_c,\mathcal{I}}\)	Total time taken for memory operations with \(f_c\) and \(\mathcal{I}\) as parameters
\(\Omega\)	Total number of floating point operations
\(Q_{\text{DRAM}}\)	Total number of bytes transferred between \(LLC \leftrightarrow DRAM\)
\(\widehat{P}_{f_c,\text{DRAM}}\)	Peak power per byte transfer between \(LLC \leftrightarrow DRAM\)
\(P^{\text{core}}_{\mathcal{I}},\; P^{\text{uncore}}_{f_c,\mathcal{I}}\)	Total power consumption of the core and uncore
\(P_{f_c,\mathcal{I}}\)	Total power consumption of the package
\(f_c\)	Frequency cap of the uncore
\(\rho^{h}_{c_i},\; \rho^{m}_{c_i}\)	Hit/Miss ratio of cache level \(i\) \((1 \le i \le N)\)
\(\mathcal{H}_{c_i}\)	Hit time to access data in cache level \(i\)
\(\mathcal{M}^{t}_{f_c,LLC},\; \mathcal{M}^{p}_{f_c,LLC}\)	Miss time and power to access data in cache level \(LLC\)

Table I. Variables and their descriptions.

B. Frequency scaling vs. capping

To understand the difference between uncore frequency scaling and uncore capping, we compare both optimization techniques. In Fig. 1, when setting the uncore frequency, we limit the maximum uncore frequency for capping the uncore component, compared to frequency scaling which fixes the uncore frequency for the entire runtime of the program. We compare the scaled and capped versions of conv2d and gemver to understand the differences. It can be seen that frequency capping provides more fine-grained control over the frequency range and reduces latencies due to frequency changes. Therefore, for improving performance, frequency capping is preferable when compared to scaling using a compiler-generated frequency control. For conv2d, capping achieves \(5.72\times\) better performance than scaling over the uncore frequency range. However, for energy-specific improvements, scaling is a viable option with up to \(11\%\) more energy improvement over capping.

TABLE II

Selected kernels from PolyBench [6] with performance characterization on BDW and RPL (static vs dynamic).

Kernels	BDW		RPL
	PolyUFC	HW	PolyUFC	HW
2mm	CB	CB	CB	CB
3mm	CB	CB	CB	CB
atax	BB	BB	BB	BB
bicg	BB	BB	BB	BB
gemm	CB	CB	CB	CB
gemver	BB	BB	BB	BB
gesummv	BB	BB	BB	BB
mvt	BB	BB	BB	BB
syr2k	BB	CB	CB	CB
syrk	CB	CB	CB	CB
doitgen	CB	CB	CB	CB
correlation	CB	CB	CB	CB
floyd-warshall	BB	BB	BB	BB
deriche	BB	BB	BB	BB
adi	BB	BB	BB	BB
jacobi-1d	CB	CB	CB	CB
trmm	CB	CB	CB	CB
trisolv	BB	BB	BB	BB
cholesky	CB	CB	CB	CB
lu	CB	BB	CB	CB
durbin	CB	CB	CB	CB
gramschmidt	CB	BB	CB	CB

C. Kernel Characterization

In Fig. 2, we show the kernel characterization for BDW. PolyUFC correctly characterizes PolyBench [6] codes on large problem sizes for 20 out of 22 kernels that compile within the timeout limit of 30 minutes or are parallelized using Polygeist [7] with the Pluto optimizer. In Table II, we show the performance characterization accuracy for PolyBench kernels on BDW and RPL.

**Fig. 2.** Performance and Power Characterization on BDW for PolyBench with large problem size. Vertically, from top to bottom, the characterization of programs shifts from **BB → CB** due to higher OI.

Category / Description	Event
FLOPs SP (BDW)	FP_ARITH_INST_RETIRED:SCALAR
FLOPs DP (BDW)	FP_ARITH_INST_RETIRED:SCALAR_DOUBLE
FLOPs SP (RPL P-core)	adl_glc::FP_ARITH_INST_RETIRED:SCALAR
FLOPs DP (RPL P-core)	adl_glc::FP_ARITH_INST_RETIRED:SCALAR_DOUBLE
FLOPs SP (RPL E-core)	adl_grt::FP_ARITH_INST_RETIRED:SCALAR
FLOPs DP (RPL E-core)	adl_grt::FP_ARITH_INST_RETIRED:SCALAR_DOUBLE
LLC misses	PAPI_L3_TCM or perf::PERF_COUNT_HW_CACHE_LL:MISS

PolyUFC: Polyhedral Compilation Meets Roofline Analysis for Uncore Frequency Capping

Published in CGO’26

APPENDIX

B. Frequency scaling vs. capping

C. Kernel Characterization

D. Experimental Setup and Other Details

Go to Main Page:

REFERENCES