Reputation: 1771
A follow-up on this question (I want to keep the threads separate): I want to look at each user and the fruits they ate. But I'm only interested in the first time they eat a fruit. From there, I want to rank order the fruits eaten by time.
Some data:
set.seed(1234)
library(dplyr)
data <- data.frame(
user = sample(c("1234","9876","4567"), 30, replace = TRUE),
fruit = sample(c("banana","apple","pear","lemon"), 30, replace = TRUE),
date = rep(seq(as.Date("2010-02-01"), length=10, by = "1 day"),3))
data <- data %>% arrange(user, date)
In this case, you can see that, for example, User 1234 ate a banana on 2010-02-01, then again on 02-03, 02-04, and 02-05.
user fruit date
1 1234 banana 2010-02-01
2 1234 lemon 2010-02-02
3 1234 banana 2010-02-03
4 1234 apple 2010-02-03
5 1234 lemon 2010-02-03
6 1234 banana 2010-02-04
7 1234 banana 2010-02-05
I don't want to change anything about the relative order of fruits by time, but I do want to remove all subsequent instances of "banana" after the first one (and likewise with every other fruit).
for User 1234 in this case, I'm looking for:
user fruit date
1 1234 banana 2010-02-01
2 1234 lemon 2010-02-02
4 1234 apple 2010-02-03
One way I can think of going about this is arranging the dataframe by user > fruit > date, then keeping only the first unique observation of "fruit" by the user grouping. I'm getting hung up on how exactly to do that in dplyr. Any thoughts?
Upvotes: 1
Views: 1528
Reputation: 3753
A dplyr
solution would involve grouping by the user and fruit variables and filtering for rows with the lowest ranked date:
data %>%
group_by(user, fruit) %>%
filter(row_number(date) == 1)
Upvotes: 1
Reputation: 28441
Here is a an approach using the duplicated
function.
data %>%
group_by(user) %>%
filter(!duplicated(fruit))
# user fruit date
# 1 1234 apple 2010-02-01
# 2 1234 banana 2010-02-01
# 3 1234 pear 2010-02-03
# 4 1234 lemon 2010-02-10
# 5 4567 pear 2010-02-01
# 6 4567 banana 2010-02-05
# 7 4567 lemon 2010-02-08
# 8 9876 apple 2010-02-02
# 9 9876 pear 2010-02-02
# 10 9876 lemon 2010-02-06
Upvotes: 4