create z-score for multiple columns based on mean and sd in controls R

Question

I have a lage data set with cases and controls and multiple scores that i want to normalize based on mean and standard deviation in the controls. I want the z-scores to be added to the datafram and have the name of the original variable with a "z" infront

A sample of the data can be created with the following code (however the full data set contains several more variables):

df <- structure(
  list(
    diagnosis = structure(c(1L, 1L, 3L, 2L, 2L, 3L, 2L, 1L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L),
                                    levels = c("control", "disease1", "disease2"), class = "factor"), 
    ly = c(-0.00121860465, -0.00135562539, -0.000320672533, -0.00121297926, -0.00149409177, -0.000911659334, -0.000955248272, -0.000965053426, -0.00118177247, -0.0011801697, -0.00146885607, -0.000572099309, -0.000875518542, -0.000847725959, -0.00133010695, -0.000926367067, -0.000976587052, -0.000998975487, -0.00177083595, -0.00170147706), 
    as = c(-0.0892422222, -0.0927111112, -0.127388889, -0.113477778, -0.127388880, -0.0827111111, -0.111044444, -0.102133333, -0.106588889, -0.105422222, -0.0789352352, -0.0927111111, -0.102133333, -0.127388889, -0.137388889, -0.127727778, -0.127388889, -0.116044445, -0.116044444, -0.1205239847), 
    ad = c(-0.00126427049, -0.00115668877, -0.00113243494, -0.00117873892, -0.00109579533, -0.00116917575, -0.00126072796, -0.00123688619, -0.00103635843, -0.00119274793, -0.00117878182, -0.00112272946, -0.00126483324, -0.00107161339, -0.00114657463, -0.00119816341, -0.00118735667, -0.00113913769, -0.00113432402, -0.00115775817)), row.names = c(NA, 20L), class = "data.frame")

I have this far created the scores one by one with the following code which gives the desired output:

df$zly<-(df$ly-mean(df$ly[df$diagnosis=="control"]))/sd(df$ly[df$diagnosis=="control"])
df$zas<-(df$as-mean(df$as[df$diagnosis=="control"]))/sd(df$as[df$diagnosis=="control"])
df$zad<-(df$ad-mean(df$ad[df$diagnosis=="control"]))/sd(df$ad[df$diagnosis=="control"])

However I have to create >20 z-scores in more 4 different dataframes and wish to make the code more efficient. Any suggestions?

create z-score for multiple columns based on mean and sd in controls R

Answers (1)

Related Questions