Reputation: 13
I'm currently exploring an energy optimization problem but find myself unsure how to proceed effectively.
To reduce the energy consumption of a memory-bound task, one idea is to lower the computational frequency (or computational capability) of the device. This approach would decrease power consumption while having a minimal effect on latency, thereby reducing overall energy usage (E = P × t
).
Does this idea hold merit?
If so, how can we determine the optimal trade-off point? In cases where power cannot be directly measured, could specific PMU events or metrics help identify the transition point between memory-bound and compute-bound as the computational frequency decreases? For instance, in CUDA, metrics such as smsp__average_warps_issue_stalled_long_scoreboard_per_issue_active.ratio
and smsp__average_warps_issue_stalled_math_pipe_throttle_per_issue_active.ratio
might be relevant.
Additionally, could a similar strategy be applied to CPUs? Alternatively, for compute-bound tasks, would reducing the EMC frequency (memory bandwidth) be an effective approach?
Upvotes: 0
Views: 24