Reputation: 393
I have a large df like the one below, where I want to know (using the terms in the made up df) know which id that have the same fruit for the longest period of time in this biannually event. I.e. the opportunity to hold a fruit only occurs every other year.
df<-data.frame("id"=c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3),
"Year"=c(1981, 1981, 1985, 2011, 2011, 2013, 2015, 1921, 1923, 1955),
"fruit"=c("banana", "apple", "banana", "orange", "melon", "orange",
"orange", "melon", "melon", "melon"))
I have tried different kinds of group_by
, and cumsum
see below.
df<-df %>% mutate(year_diff=cumsum(c(1, diff(df$Year)>1)))
df %>% group_by(id, fruit) %>% filter(year_diff==2)
And the one below (after reloading the df)
df %>% group_by(id, fruit) %>% mutate(year_diff=cumsum(c(1, diff(df$Year)>1)))
And played around with:
df %>% group_by(id, fruit) %>% mutate(summarise(n_years=n_distinct(Year)))
In the end I ideally want a tibble like the one below arranging the id's (with their fruits) in order of who have the most consecutive "holds" of a fruit in the events (over time). Remember that the event only takes place every other year.
id fruit occurence
2 orange 3
3 melon 2
1 banana 1
1 apple 1
2 melon 1
3 melon 1
I understand that there are several steps.
EDIT: Maybe there is a way to modify this:
df %>% group_by(id, fruit) %>% summarise(n_years=n_distinct(Year)) %>% arrange(desc(n_years)) %>% ungroup()
so that it creates a new column in the original tibble (which I am unable to do, but you might be), and then I can filter consecutive events?
Upvotes: 0
Views: 86
Reputation: 389265
Using dplyr
we arrange
rows by id
, fruit
and Year
and create a new grouping variable (group
) showing consecutive years for each id
and fruit
and then count the number of rows in each group.
library(dplyr)
df %>%
arrange(id, fruit, Year) %>%
group_by(id, fruit, group = cumsum(c(2, diff(Year)) != 2)) %>%
summarise(n = n()) %>%
ungroup() %>%
select(-group)
# id fruit n
# <dbl> <fct> <int>
#1 1 apple 1
#2 1 banana 1
#3 1 banana 1
#4 2 melon 1
#5 2 orange 3
#6 3 melon 2
#7 3 melon 1
Upvotes: 2