Sylvia Rodriguez
Sylvia Rodriguez

Reputation: 1353

Calculate sub-column-wise z scores in R data.frame

I have the following example data.frame.

df = data.frame(a=c(rep("a",8), rep("b",5), rep("c",7), rep("d",10)), 
    b=rnorm(30, 6, 2), 
    c=rnorm(30, 12, 3.5), 
    d=rnorm(30, 8, 3)
    )

For each column, I would like to calculate z scores per subgroup defined in column a. This post was helpful for me and I can now do this using the following:

 df$b.zscore <- ave(df$b, df$a, FUN = scale)
 df$c.zscore <- ave(df$c, df$a, FUN = scale)
 df$d.zscore <- ave(df$d, df$a, FUN = scale)

But my real data has many more columns. Is there a more elegant way to accomplish this for columns b-d? Maybe using a for loop? How could I do that, please? I hope anyone can help. Thank you.

Upvotes: 0

Views: 135

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

You can use lapply over the columns :

cols <- c('b', 'c', 'd')
new_cols <- paste0(cols, '_zscore')
df[new_cols] <- lapply(df[cols], function(x) ave(x, df$a, FUN = scale))

However, such operations which operate on multiple columns are better done with dplyr

library(dplyr)

df %>%
  group_by(a) %>%
  mutate(across(b:d, list(zscore = ~as.numeric(scale(.)))))
  #For dplyr < 1.0.0
  #mutate_at(vars(b:d), list(zscore = ~as.numeric(scale(.))))

and with data.table :

library(data.table)
setDT(df)[, (new_cols) := lapply(.SD, function(x) as.numeric(scale(x))), a, 
            .SDcols = cols]

Upvotes: 1

Related Questions