zcfh
zcfh

Reputation: 121

Can the binary optimized by Autofdo and bolt be iteratively optimized?

Iterative optimization refers to sampling the optimized binary to obtain pgo_bolt_bin.perf_data, and using pgo_bolt_bin.perf_data for pgo and bolt optimization. Let me start with what I know so far and some simple tests I performed.

  1. Autofdo itself can perform iterative optimization, and using pgo_bolt_bin.perf_data can still have optimization effects.
  2. Bolt itself can also iterate. Use the --enable-bat option to map the address offset back to the pre-optimized binary.

But the problem currently encountered is that when autofdo and bolt are used together, the data sampled by pgo_bolt_bin cannot be used for iterative optimization, such as the following scenario.

base -- perf --> pgo0 -- perf --> pgo_bolt0
perf from pgo_bolt0 --> pgo1 -- perf from pgo_bolt0 --> pgo_bolt1

When using bolt and generating pgo_bolt1, the following warning will appear

BOLT-WARNING: 1 (100.0% of all profiled) function have invalid (possibly stale) profile. Use -report-stale to see the list.

I did some simple tests. Due to the different data converted by autofdo, the optimization performed by clang seems to be somewhat different, resulting in some differences between pgo0 and pgo1. I understand that it is because the binary instruction address has changed, causing the mapping done by bolt to become invalid. . Is there any way to solve this problem?

Upvotes: 0

Views: 57

Answers (0)

Related Questions