Reputation: 1218
I am quite new to R. I have a table that with the header (Value, Benchmark, Suite, Var) and I want to normalize each Value to the mean of the baseline, depending on the combination of (Benchmark, Var). So, each entry (Value, Benchmark, Suite, Var)
should be normalized to the mean value of the baseline where Benchmark
and var
are equal.
The data represents different benchmark measurements, where var are different input sizes. The data looks like this draft:
Value Benchmark Suite Var
500 Benchmark2 baseline 1732
889 Benchmark baseline 1732
500 Benchmark2 baseline 1732
889 Benchmark baseline 1732
300 Benchmark Approach1 1732
100 Benchmark2 Approach1 1732
After the transformation, it would look like this:
Value Benchmark Suite Var RuntimeRatio
500 Benchmark2 baseline 1732 1.00
889 Benchmark baseline 1732 1.00
500 Benchmark2 baseline 1732 1.00
889 Benchmark baseline 1732 1.00
300 Benchmark Approach1 1732 0.34 # 300 compared to mean(889,889) of each (Benchmark,baseline,1732)
100 Benchmark2 Approach1 1732 0.20 # 100 compared to mean(500,500) of each (Benchmark2,baseline,1732)
I currently have something like, but that does not calculate the right thing:
norm <- ddply(data, Var ~ Benchmark, transform,
RuntimeRatio = Value / mean(Value[Suite == "baseline"]))
Upvotes: 0
Views: 73
Reputation: 7818
I think the best and cleanest way to do it is to have a bit of data manipolation prior to the operation.
Your Data:
df <- tibble::tribble(
~Value, ~Benchmark , ~Suite , ~Var,
500 , "Benchmark2", "baseline" , 1732,
889 , "Benchmark" , "baseline" , 1732,
500 , "Benchmark2", "baseline" , 1732,
889 , "Benchmark" , "baseline" , 1732,
300 , "Benchmark" , "Approach1" , 1732,
100 , "Benchmark2", "Approach1" , 1732
)
With the package dplyr
we can easily and intuitively manipulate data.
library(dplyr)
# separate the baseline from the rest
df_baseline <- df %>% filter(Suite == "baseline")
df_compare <- df %>% filter(Suite != "baseline")
# calculate the mean of the baseline value for each Benchmark-Var
df_baseline <- df_baseline %>%
group_by(Benchmark, Var) %>%
summarise(Value_baseline = mean(Value)) %>%
ungroup()
# Join the baseline data to the rest of your data with the approaches
df_compare <- df_compare %>%
left_join(df_baseline, by = c("Benchmark", "Var"))
# Calculate your ratio
df_compare %>%
mutate(RuntimeRatio = Value / Value_baseline)
# # A tibble: 2 x 6
# Value Benchmark Suite Var Value_baseline RuntimeRatio
# <dbl> <chr> <chr> <dbl> <dbl> <dbl>
# 1 300 Benchmark Approach1 1732 889 0.337
# 2 100 Benchmark2 Approach1 1732 500 0.2
This approach gets what I believe you may need.
But if you want exactly what you asked, you need to join df_baseline
to the original df
in this way:
df %>%
left_join(df_baseline, by = c("Benchmark", "Var")) %>%
mutate(RuntimeRatio = Value / Value_baseline) %>%
select(-Value_baseline)
# # A tibble: 6 x 5
# Value Benchmark Suite Var RuntimeRatio
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 500 Benchmark2 baseline 1732 1
# 2 889 Benchmark baseline 1732 1
# 3 500 Benchmark2 baseline 1732 1
# 4 889 Benchmark baseline 1732 1
# 5 300 Benchmark Approach1 1732 0.337
# 6 100 Benchmark2 Approach1 1732 0.2
Upvotes: 2