Mayou
Mayou

Reputation: 8818

Summary table by group in R

Consider the following dataframe:

 df <- data.frame(group = c("group1", "group1", "group2", "group2", "group2", "group3"), factor = paste("factor", 1:6, sep=""), vol = seq(from = 0.02, length.out = 6, by = 0.02))

The first column defines a top-level group for each factor in the second column. The third column is the value of standard deviation for each factor.

I would like to generate a summary table with the groups only, and the standard deviation for each group defined as:

Is there any easy way of creating the summary table, where vol of each group is computed using this custom function?

Any help would be appreciated! Thank you.

Upvotes: 3

Views: 8109

Answers (4)

s&#248;ren
s&#248;ren

Reputation: 80

I can recommend aggregate() from the basic package stats, though you have to define a new function first.

ss<-function(x){sqrt(sum(x^2))}
aggregate(vol~group,data=df,FUN=ss)

Upvotes: 1

Brandon Bertelsen
Brandon Bertelsen

Reputation: 44648

A base solution for good measure.

by(df,df$group,function(x) sqrt(sum(x$vol^2)))

If you need it to look prettier:

as.table(df,df$group,function(x) sqrt(sum(x$vol^2))))

df$group
    group1     group2     group3 
0.04472136 0.14142136 0.12000000 

Upvotes: 5

TWL
TWL

Reputation: 2300

May I propose a solution using ddply function:

# require(plyr)
ddply(df, .(group), summarize, std = sqrt(sum(vol^2)))

#    group        std
# 1 group1 0.04472136
# 2 group2 0.14142136
# 3 group3 0.12000000

Upvotes: 4

andyteucher
andyteucher

Reputation: 1453

Using the amazing new dplyr package, I think this is what you're looking for:

require(dplyr)

df <- data.frame(group = c("group1", "group1", "group2", "group2", "group2", "group3"), 
                 factor = paste("factor", 1:6, sep=""), 
                 vol = seq(from = 0.02, length.out = 6, by = 0.02))

df %.% group_by(group) %.% summarise(grp_std=sqrt(sum(vol^2)))

# Source: local data frame [3 x 2]

#    group    std_dev
# 1 group1 0.04472136
# 2 group2 0.14142136
# 3 group3 0.12000000

The chaining syntax using %.% takes a bit of getting used to, but it becomes very intuitive. Alternative syntax:

df_grouped <- group_by(df, group)

summarise(df_grouped, grp_std=sqrt(sum(vol^2)))

Upvotes: 3

Related Questions