Dale Kube
Dale Kube

Reputation: 1460

Adding Standard Deviation for Each Unique Factor Grouping

I'm trying to bring in the standard deviation for each unique factor grouping in my data. I've researched techniques using the data.table package and the plyr package and haven't had any luck. Here is a basic example of what I'm trying to accomplish.

Group  Hours
  120     45
  120     60
  120     54
  121     33
  121     55
  121     40

I'm trying to turn the above into:

Group  Hours     SD
  120     45  7.343
  120     60  7.343
  120     54  7.343
  121     33  9.833
  121     55  9.833
  121     40  9.833 

Upvotes: 0

Views: 92

Answers (2)

Dale Kube
Dale Kube

Reputation: 1460

Thank you, David, for your detailed response! I've used data.table to write what I'm looking for. Here is a snippet of my final script that I wrote using David's answer.

PayrollHoursSD <- as.data.table(PayrollHours2)[, SD := sd(TOTAL.HOURS), by = COMBO]
head(PayrollHoursSD)

#    COMBO    PAY.END.DATE  TOTAL.HOURS          SD
# 1:   1-2           10-06     42561.78    4297.287
# 2:   1-2           10-13     42177.88    4297.287
# 3:   1-2           10-20     44691.23    4297.287
# 4:   1-2           10-27     42709.28    4297.287
# 5:   1-2           11-03     44876.25    4297.287
# 6:   1-2           11-10     40582.44    4297.287

Upvotes: 0

David Arenburg
David Arenburg

Reputation: 92302

Base solution (assuming your data called df)

transform(df, SD = ave(Hours, Group, FUN = sd))

data.table solution

library(data.table)  
setDT(df)[, SD := sd(Hours), by = Group]

dplyr solution

library(dplyr)
df %>%
  group_by(Group) %>%
  mutate(SD = sd(Hours))

And here's a plyr solution (my first ever) as you asked for it

library(plyr)
ddply(df, .(Group), mutate, SD = sd(Hours))

(It is better to avoid having both plyr and dplyr loaded at the same time)

Upvotes: 4

Related Questions