LLVM not optimizing composition of functions in IR

Question

I have an LLVM based JIT compiler and am having trouble optimizing functions with the pattern func1(func2(x)). The issue can be demonstrated with the following example:

#include 


extern "C" double transform_4326_900913_x(const double x) {
  return x * 111319.490778;
}

extern "C" double transform_4326_900913_y(const double y) {
  return 6378136.99911 * tan(log(.00872664626 * y + .785398163397));
}

extern "C" void transform(double* out, const double x_in, const double y_in) {
    out[0] = transform_4326_900913_x(x_in);
    out[1] = transform_4326_900913_y(y_in);
}

double program(const double x_in, const double y_in) {
    double arr[2];
    transform(arr, x_in, y_in);
    return arr[0];
}

If I compile this code to LLVM using clang, and don't use fast math, I get:

  %3 = fmul double %0, 0x40FB2D77DA3A083A, !dbg !416
  call void @llvm.dbg.value(metadata double %3, metadata !407, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !411
  call void @llvm.dbg.value(metadata double %1, metadata !367, metadata !DIExpression()) #4, !dbg !417
  %4 = fmul double %1, 0x3F81DF46A252DD11, !dbg !419
  %5 = fadd double %4, 0x3FE921FB54441D52, !dbg !420
  %6 = tail call double @log(double %5) #4, !dbg !421
  %7 = tail call double @tan(double %6) #4, !dbg !422
}

This is expected -- llvm does not know whether the stdlib calls log and tan can be removed without affecting the program. If I switch to -ffast-math, then everything is nicely optimized:

define dso_local double @_Z7programdd(double %0, double %1) local_unnamed_addr #1 !dbg !398 {
  %3 = fmul fast double %0, 0x40FB2D77DA3A083A, !dbg !413
  ret double %3, !dbg !414
}

I have a similar problem in my system, following the same pattern as above. Consider the following IR:

ST_X_nullcheck_false:                             ; preds = %groupby_nullcheck_true
  %10 = call i8* @array_buff(i8* %col_buf0, i64 %pos) #14
  %ST_Transform_Array2 = alloca [2 x double], align 8
  %ST_Transform_Array2.sub = getelementptr inbounds [2 x double], [2 x double]* %ST_Transform_Array2, i64 0, i64 0
  %11 = bitcast i8* %10 to i32*
  %compressed_x_coord = load i32, i32* %11, align 4
  %12 = call double @decompress_x_coord_geoint(i32 %compressed_x_coord) #14
  store double %12, double* %ST_Transform_Array2.sub, align 8
  %13 = getelementptr i8, i8* %10, i64 4
  %14 = bitcast i8* %13 to i32*
  %compressed_y_coord = load i32, i32* %14, align 4
  %15 = call double @decompress_y_coord_geoint(i32 %compressed_y_coord) #14
  %16 = getelementptr inbounds [2 x double], [2 x double]* %ST_Transform_Array2, i64 0, i64 1
  store double %15, double* %16, align 8
  %17 = fmul double %12, 0x40FB2D77DA3A083A
  store double %17, double* %ST_Transform_Array2.sub, align 8
  %18 = fmul double %15, 0x3F81DF46A252DD11
  %19 = fadd double %18, 0x3FE921FB54441D52
  %20 = call double @Tan(double %19) #15
  %21 = call double @ln(double %20) #15
  %22 = fmul double %21, 0x415854A63FF16B12
  store double %22, double* %16, align 8
  %x_coord2 = load double, double* %ST_Transform_Array2.sub, align 8
  br label %filter_false
}

filter_false:                                     ; preds = %groupby_nullcheck_true, %ST_X_nullcheck_false
  %ST_X_nullcheck_value = phi double [ %x_coord2, %ST_X_nullcheck_false ], [ 0x10000000000000, %groupby_nullcheck_true ]
  %3 = getelementptr inbounds i64, i64* %6, i64 1
  call void @agg_id_double_shared(i64* nonnull %3, double %ST_X_nullcheck_value)
  ret i32 0

Here, I have manually added attributes ({ nobuiltin nofree norecurse nosync nounwind readnone willreturn }) to the @Tan and @ln functions, and none of the outputs from those functions are used later in the function. If I remove one of the functions (either one), I get the desired optimization -- the entire "y" branch is removed, the alloca is converted to a scalar value, and the scalar is read by the phi in filter_false. But something about the composition of functions seems to be breaking the ability of instruction combine to cull the IR. I've played with a number of combinations of attributes and/or passes, but am beginning to suspect I've overlooked something more basic.

LLVM not optimizing composition of functions in IR

Answers (1)

Related Questions