Reputation:
I have a large data frame that looks like this
iid Q1 Q2 Q16 Q17 Q18 pop a b c
Sample1 0 0 0 0 0 First A br1 Alpha1
Sample2 0 0 0 0 0 First A br1 Alpha1
Sample3 0 0 0 0 0 Second B br1 Beta
Sample4 0 0 0 0 0 Second B br1 Beta
Sample5 0 0 0 0 0 Third C br2 Gamma
Sample6 0 0 0 0 0 Fourth C br2 Delta
Sample7 0 0 0 0 0 First A br1 Alpha2
Sample8 0 0 0 0 0 First A br1 Alpha2
Those 0s are stand-ins for numbers between 0 and 1, which add up to 1. For a table this size, I could easily do it by hand, but my actual data frame has #### samples (iid) from ### populations (pop) with 18 Q columns. The samples are not sorted by population order (though I could sort it if it would simplify the coding).
For each population I am looking to get the average value for all 18 Q columns, so I can get an output like this:
Q1 Q2 Q16 Q17 Q18 a b c
0 0 0 0 0 First A br1
0 0 0 0 0 Fourth C br2
0 0 0 0 0 Second B br1
0 0 0 0 0 Third C br2
as well as output like this (when sorting by c instead)
Q1 Q2 Q16 Q17 Q18 pop a b c
0 0 0 0 0 First A br1 Alpha1
0 0 0 0 0 First A br1 Alpha2
0 0 0 0 0 Second B br1 Beta
0 0 0 0 0 Fourth C br2 Delta
0 0 0 0 0 Third C br2 Gamma
I was wondering how could I get R to do something like this?
Upvotes: 0
Views: 149
Reputation: 887118
We can use dplyr
. We specify the grouping columns in the group_by
and get the mean
of the 'Q' columns with summarise_each
. In matches
we can use regex patterns to match the column names. In this case I am matching column names that start with 'Q' (^Q
) followed by numbers (\\d+
).
library(dplyr)
df1 %>%
group_by(pop, a, b) %>%
summarise_each(funs(mean), matches('^Q\\d+'))
# pop a b Q1 Q2 Q16 Q17 Q18
#1 First A br1 0 0 0 0 0
#2 Fourth C br2 0 0 0 0 0
#3 Second B br1 0 0 0 0 0
#4 Third C br2 0 0 0 0 0
For the second case, we need to add the 'c' also in the group_by
df1 %>%
group_by(pop, a, b, c) %>%
summarise_each(funs(mean), matches('^Q\\d+'))
# pop a b c Q1 Q2 Q16 Q17 Q18
#1 First A br1 Alpha1 0 0 0 0 0
#2 First A br1 Alpha2 0 0 0 0 0
#3 Fourth C br2 Delta 0 0 0 0 0
#4 Second B br1 Beta 0 0 0 0 0
#5 Third C br2 Gamma 0 0 0 0 0
Upvotes: 1