Peter Houk
Peter Houk

Reputation: 107

tapply function with multiple groups

trying to reshape some data tables using tapply. Straight forward enough if you have one factor, one variable, and your desired mathematical function. However I have some datasets where I'd like to reformat with two (or perhaps more) grouping levels.

Consider

x<-1:20 # variable
y<-factor(rep(letters[1:5], each=4)) # first grouping variable
z<-factor(rep(letters[6:7], each=10)) # second grouping variable
tapply(x,z,sum) # summarized table for factor z

  f   g 
 55 155

tapply(x,y,sum) # summarized table for factor y

 a  b  c  d  e
10 26 42 58 74 

However, my desired output is would be a table that is something like:

f  f  f  f  f g  g  g  g  g
a  b  c  d  e a  b  c  d  e
6  8  10....etc

So, just trying to keep higher level grouping in tables. Sorry if a simple question, I've looked around and can't find anything.

Upvotes: 0

Views: 10816

Answers (2)

Kerry
Kerry

Reputation: 803

This is my code Ive used on my own data

with(reduced, do.call(rbind, tapply(WR, list(period, no.C), 
                           function(x) c(WR = mean(x), SD = sd(x)))))

reduced = my data frame
WR is the variable I want to calculate the mean from
period is one of my grouping variables.  in this case its binary 
no.C is another grouping variable - here I have 3 groups

The rest of the equation is the function, but that can easily be replaced by just writing mean (or sum or whatever other statistic you are after) if you only want one value, but I also want it to calculate the standard deviation and I am binding it into a little table that I can print later with the rbind. Sorry I didn't put the answer into context of your data - but I was confused as to what exactly you wanted.

Basically, in using the list you can start to create as many grouping values as you want while still using tapply.

You can also do something similar with aggregate - see this quick web page for a tidy answer and examples to your question.

with(reduced, aggregate(WR, list(period, no.C), mean))

Upvotes: 3

R for the Win
R for the Win

Reputation: 116

You can use the dplyr package, much easier and much faster if you are dealing with large datasets. However, it only works with data frames.

d <- data.frame(x=x,y=y,z=z)

For the first case:

groups <- group_by(d,z)
summarise(groups,sum(x))

  z sum(x)
1 f     55
2 g    155

For the second case:

groups <- group_by(d,y)
summarise(groups,sum(x))

  y sum(x)
1 a     10
2 b     26
3 c     42
4 d     58
5 e     74

And for the last case:

groups <- group_by(d,z,y)
summarise(groups,sum(x))

  z y sum(x)
1 f a     10
2 f b     26
3 f c     19
4 g c     23
5 g d     58
6 g e     74

Upvotes: 1

Related Questions