jolii
jolii

Reputation: 97

filtering date per factor group

i have this datasets

set.seed(1)
df1<- data.frame(
  user = as.factor(rep(c("mike","john","david", "gabriel"), each =4)),
  trx_date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 16)
)

df2<- data.frame(
  user = as.factor(c("mike","john","david")),
  filter_date= as.Date(c("1999-07-29", "1999-03-08", "1999-10-24"))

how do i filter any trx_date in df1 which happen after filter_date in df2 per user?

Upvotes: 0

Views: 26

Answers (3)

akrun
akrun

Reputation: 887118

In base R, we can use merge with subset

subset(merge(df1, df2, by = 'user'), trx_date > filter_date)

Upvotes: 0

MrGumble
MrGumble

Reputation: 5766

Using the package dplyr, you could do

library(dplyr)
full_join(df1, df2, by=c('user')) %>%
  group_by(user) %>%
  filter(trx_date >= filter_date)

But what do you want to do with "gabriel"? It does not exist in df2, so how should that be filtered? With the above solution, it is lost. If you want to keep it, replace filter with filter(trx_date >= filter_date | is.na(filter_date)). (Note the use of a single | as opposed to the usual ||)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You can join the two dataframes and then filter :

library(dplyr)
df1 %>%
  inner_join(df2, by = 'user') %>%
  filter(trx_date > filter_date)

Upvotes: 1

Related Questions