Reputation: 400
I want to normalize data in R but not in a specific range (e.g. 0 to 1). I have a table like the following:
benchmark | technique | ipc
correlation | no_compression | 0.5
correlation | compression-bdi | 0.6
trisolv | no_compression | 0.6
trisolv | compression-bdi | 0.7
I want the IPC value of no_compression for every benchmark to be 1. The remaining techniques for a specific benchmark will be based on that no_compression value. So for example, the IPC value for compression-bdi for correlation would be 1.2.
Is there any function that I could use? I can only find mentions of normalizing to a certain range.
Upvotes: 0
Views: 144
Reputation: 388862
You could also use match
which returns the index of first match to find "no_compression" technique
library(dplyr)
df %>%
group_by(benchmark) %>%
mutate(ipc = ipc/ipc[match('no_compression', technique)])
# benchmark technique ipc
# <fct> <fct> <dbl>
#1 correlation no_compression 1
#2 correlation compression-bdi 1.2
#3 trisolv no_compression 1
#4 trisolv compression-bdi 1.17
Using data.table
that would be
library(data.table)
setDT(df)[, ipc := ipc/ipc[match('no_compression', technique)], benchmark]
Upvotes: 0
Reputation: 35207
Using dplyr
:
df %>%
group_by(benchmark) %>%
mutate(ipc_standardized = ipc / ipc[technique == 'no_compression'])
# A tibble: 4 x 4 # Groups: benchmark [2] benchmark technique ipc ipc_standardized <chr> <chr> <dbl> <dbl> 1 correlation no_compression 0.5 1 2 correlation compression-bdi 0.6 1.2 3 trisolv no_compression 0.6 1 4 trisolv compression-bdi 0.7 1.17
Or using base R:
df$ipc_standarized <- unlist(lapply(
split(df, df$benchmark),
function(.) .$ipc / .$ipc[.$technique == 'no_compression'])
)
Upvotes: 2