Prototyping CPU Thermal Management on Renesas RCar-H2 Platform

Purpose

The purpose of this page is document how to enable basic CPU thermal management on Renesas platforms (RCar-H2 being the primary target) in a generic way.

DISCLAIMER

FOR ILLUSTRATIVE PURPOSE ONLY!!! DVFS & thermal management parameters (trip points, CPU OPPs, policy) are all empirical and not optimized. Platform stability and reliability may not be guaranteed.

Known bugs/limitations

  • As H2 silicon characterization data are not available, all CPU OPPs use the same voltage. Only the CPU frequency is scaled, hence limiting thermal management efficiency.
  • We tried to also enable voltage scaling (defining empirical supply voltages). However, even if there was no error detected from SW side, the supply voltage did not physically change. We tracked the I2C data written to the chip registers and it looked coherent (a data byte was updated following voltage formula “300mV + 10mV*step”). As we couldn't get the DA9210 datasheet (including register map) in time to analyze registers configuration and so we couldn't debug it further.

How To

Upstream, H2 kernel already includes temperature sensor driver, but no thermal policy and cpufreq driver.

  1. (Optional) Add device tree ethernet support (to support bootp)
  2. Fix clock frequency change using kick bit
  3. Add CPUFreq support (originated by Guennadi Liakhovetski and updated with Device Tree support)
  4. Add missing clock handling to rcar-thermal driver when using device tree (temporarily HACK until DT support fixed]
  5. (Optional) Minor updates to rcar-thermal driver
  6. Register cpufreq cooling device, add passive trip point and bind it to thermal zone 0.
  7. (Optional) Boot all H2 8 cores (4*C-A15 + 4*C-A7)
    1. Enable multicluster operation on the kernel command using “apmu=multicluster”

All these changes are available in this branch, and with additional debug traces in this branch.

Results

Temperature sensor and CPU0 clock frequency were traced against time in 3 different scenarios:

  1. No Thermal Management, CPU Fan ON
  2. No Thermal Management, CPU Fan OFF
  3. Prototype Thermal Management Enabled, CPU Fan OFF

With the following conditions:

  • 4 C-A15 CPU cores 100% loaded
  • C-A7 cores not booted
  • Passive Trip Point set to 35C

The 4 C-A15 CPU cores were loaded using cpuloadgen tool, which sources can be found here.

The trace and GNUPlot script files can be found here (see included README file for further details and instructions).

Below is a plot of these data:

As one can see, as soon as temperature reaches 35C (trip point 0) the thermal policy (step-wise) decreases the CPU OPP by 1 step. And each time temperature increases by 5C, CPU speed is decreased by another step. However, as only CPU frequency scaling is used (CPU OPPs have the same supply voltage (1V)), it is not sufficient to really stop the temperature increase, it is actually only slowing it down. This is not a real surprise, as in thermal management, the real key is voltage scaling (leakage current is quadratically proportional to voltage & temperature).

Going further

Potential ways of investigation to do a better thermal management:

  • Get silicon characterization data to optimize CPU OPPs, notably voltage levels.
  • Define another trip point from which CPU cores may be hot plugged out to further reduce leakage currents.
  • In case thermal management was enabled by default on next Renesas development platforms, add circuitry to control the (noisy…) CPU Fan from thermal management policy.
renesas_r-car_h2/thermal_management.txt · Last modified: 2014/03/05 11:00 by ptitiano
Recent changes RSS feed Creative Commons License Donate Minima Template by Wikidesign Driven by DokuWiki