Reputation: 47
I am looking for an R solution. I have the following data frame (this is a sample):
df <- data.frame(groupID = c("Jon", "Jon", "Jon","Jon", "Jon", "Maria", "Maria", "Ben", "Ben", "Tina", "Tina"),
breeding_attempt = c(1, 1, 1, 2, 2, 1, 1 , 1, 1, 1, 1),
year = c(1999, 1999, 1999, 1999, 1999, 2000, 2000, 2000, 2000, 2001, 2001),
femaleID = c("Jony", "Jona", "sami", "Jon", "Jona", "aa", "BB", "Tana", "tt", "gg", "HH"),
chicks = c(3, 0, 0, 0, 0, 2, 1, 3, 4, 1, 0))
I need to do 2 actions, both of which while considering the breeding_attempt per year per groupID as the unit of calculation.
(a) how do I remove from the data all breeding_attempts in the same year AND for the same groupID in which all the participating females had 0 chicks? (e.g. breeding_attepmt 2, year 1999, group "Jon" need to be removed) Please, note that the grouping needs to have 3 levels GroupID->year->breeding_attempt
(b) After having the data subsetted as in (a), how do I calculate from the subsetted data the percentage of breeding_attempts, per year AND per group in which only 1 female had >0 chicks and all other participating females add 0 chicks? (i.e. the percentage of breeding_attempts with a single successful female out of all breeding_attempts). In this sample, it should be 50% as groups "jon" 1999 1 and "Tina" 2000 1 had only one successful female.
Ideally, I will also be able to get a data frame that summarises the raw data. Namely, a dataframe in which each line represents a breeding_attempt per year per group ID and a column indicating whether there was only 1 successful female or not.
I tried working with the aggregate function, but I am new to R and did not get far with it...
Thanks!
Upvotes: 0
Views: 56
Reputation: 1664
Since you seem to be looking for a base R solution, here is mine:
# Question a
agg_a <- aggregate(chicks~groupID+year+breeding_attempt, data=df, sum)
df2 <- subset(df, !(groupID %in% agg_a$groupID[agg_a$chicks==0] &
year %in% agg_a$year[agg_a$chicks==0] &
breeding_attempt %in% agg_a$breeding_attempt[agg_a$chicks==0]))
# Question b
agg_b <- aggregate(chicks>0~groupID+year+breeding_attempt, data=df2, sum)
agg_b$just1 <- agg_b$`chicks > 0`==1
sum(agg_b$just1)/nrow(agg_b)
I think the agg_b
data.frame provides the summarization you were also looking for.
Since you are new to R and tried to use aggregate
, you may not know that there is a framework in R called the tidyverse which has a specific syntax and is often opposed to base R. For beginners it may be difficult to learn the base R and the tidyverse way of doing things at the same time, which is why you may want to stick with base R at the moment.
Nevertheless, here is a possible tidyverse solution:
# Question a
df2 <- df |>
group_by(groupID, year, breeding_attempt) |>
filter(sum(chicks)>0)
# Question b
agg_b <- df2 |>
group_by(groupID, year, breeding_attempt) |>
summarise(just1=sum(chicks>0)==1)
sum(agg_b$just1)/nrow(agg_b)
Upvotes: 1