Reputation:
I want to calculate Z-scores using means and standard deviations generated from each group. For example I have following table. It has 3 groups of data, I can generate mean and standard deviation for each group. Then I use group 1 mean and SD to calculate the Zscores for group one data points, and etc...
> dat group level y 1 1 A 10.8 2 1 B 12.0 3 1 C 9.6 4 1 A 12.0 5 1 B 7.8 6 1 C 10.8 7 2 A 8.7 8 2 B 9.2 9 2 C 8.2 10 2 A 10.0 11 2 B 12.2 12 2 C 8.2 13 3 A 10.9 14 3 B 8.3 15 3 C 10.1 16 3 A 9.9 17 3 B 10.9 18 3 C 10.3
I have learned from this blog on how to get summary data by group, but not sure how to go from there.
Thanks.
Upvotes: 9
Views: 16293
Reputation: 2183
In dplyr
library(dplyr)
dat_z = dat %>%
group_by(group) %>%
mutate(z_score = scale(y))
Upvotes: 3
Reputation: 1925
Base R (i.e., no dependencies required) includes the functions ave()
(for group wise application) and scale()
(for calculating z-scores):
dat$z <- ave(dat$y, dat$group, FUN=scale)
Then the new variable z
in dat
will contain the groupwise-scaled variable.
Note that unlike similar functions in Base R (e.g., sapply
, lapply
), you need to include FUN=
explicitly.
Upvotes: 14
Reputation: 835
You can use the ddply function of plyr and calculate the z score.
library(plyr)
dat <- ddply(dat, .(group), summarize, z_score=scale(y))
or you can calculate it manually as -
dat <- ddply(dat, .(group), summarize, z_score=(y-mean(y))/sd(y)))
If you have na's in your data, then add na.rm=True in the mean and sd functions.
Hope this helps.
Upvotes: 3
Reputation: 653
I would check out data.table for this.
Something like:
require(data.table)
datDT <- data.table(dat)
datDT[, yScaled := scale(y), by = group]
Upvotes: 6