Reputation: 113
In R, I have a data frame that looks like this:
Female.ID Mate.ID relatedness
1 A1 C1 0.0000
2 A1 D1 0.0000
3 A1 E1 0.5062
4 A1 F1 NA
5 B1 G1 0.0425
6 B1 H1 0.0000
7 B1 I1 0.0349
8 B1 J1 0.0000
9 B1 K1 0.0000
10 B1 L1 0.0887
11 B1 M1 0.1106
12 B1 N1 0.0000
I want to create a new dataframe and find the mean relatedness of all the mates for female.ID A1 and the mean relatedness for all the mates of female.ID B1, etc.
I want something like this:
Female.ID mean.relatedness
A1 0.1687
B1 0.0346
This dataframe is much bigger than this example one, which is why I'm not just subsetting for the female one by one and finding the mean relatedness. I was thinking of doing some kind of for loop, but I'm not sure how to start it off.
Upvotes: 0
Views: 65
Reputation: 1
The idea is:
If the data is too large you may need to use a faster package like data.table (which is a fast package with a simple syntax). for more details please take a look at this link data.table vs dplyr: can one do something well the other can't or does poorly?
In general looping is not optimized in R. It can be kept as a final solution only if the treatment can't be supported by the package.
Here the syntax using data.table (df being the initial data.frame)
library(data.table)
dt<- as.data.table(df)
dt1 <- dt[, .(mean.relatedness= mean(relatedness, na.rm = TRUE)),
by="Female.ID"]
>dt1
Female.ID mean.relatedness
1: A1 0.1687333
2: B1 0.0345875
note that the grouping-by can be done over a multi-variables vector, the summarizing function can be other than the mean, and na.rm = TRUE is needed to ignore the NA while summarizing.
Upvotes: 0
Reputation: 3092
You could use dplyr:
library(dplyr)
themeans <- df %>%
group_by(Female.ID) %>%
summarize(mean.relatedness = mean(relatedness, na.rm = T)
Upvotes: 4