Map two data frames in a certain condition

Question

But now my current problem is slightly different that I couldn't use this solution or figure out . I'd like the data from data set 1 that occurred before dataset 2 and this is my data:

 # Dataset 1 (dts1)

     UserID   date   Hour     Events    
  1    5  25/07/2016  02:31      8         
  2    5  30/07/2016  02:42      6      
  3    4  23/07/2016  07:52      9         
  4   14  24/07/2016  03:02      5         
  5   17  25/07/2016  09:12      10        
  6    4  22/07/2016  03:22      4

and

   # Dataset 2 (dts2)

    UserID       date   Hour      transactions     
 1     5    25/07/2016  02:29      4         
 2     4    24/07/2016  02:42      2       
 3     5    25/07/2016  02:52      3         
 4     6    24/07/2016  03:02      4         
 5     6    25/07/2016  03:12      1        
 6    14    26/07/2016  03:22      3

So, I wish to compare those dataset from dataset 1 and only add those happened before dataset 2. In other words, I want to make sure that I do not count those events that happened after the last transaction of a user. The Ideal output is as follows:

    #output 

   UserID   Events      transaction 

    5         8         4,3
    4         9,4       2
   14         5         3
   17         10        NA

In the above example, I made sure that I removed event 6 for user 5 because it happened after his last transaction.

aichao · Accepted Answer

This is a modification of @dimitris_ps answer for your previous question. If he chooses to answer, I will gladly delete mine.

The main difference between this problem and your previous problem is that we now want all dts1 events that are before the last dts2 transaction for each specific UserID. Therefore, we want to group_by the UserID first and then filter for only those rows where the dts1 event times are less than the last dts2 transaction time. Then we can summarise both unique Events and transactions, still grouped by UserID.

The code is:

library(dplyr)

## I will not use the lubridate package, instead I will convert the time
## using as.POSIXct
dts1$time <- as.POSIXct(paste(dts1$date, dts1$Hour), format="%d/%m/%Y %H:%M")
dts2$time <- as.POSIXct(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M")

# first join the two data.frames by UserID.
result <- left_join(dts1, dts2, by="UserID") %>%

# all subsequent processing is grouped by the UserID because we 
# want to compare the last transaction time to the Event times
# for each UserID.
group_by(UserID) %>%

# apply the filtering condition dts1 Event must be before last dts2 transaction.
# Note that we keep rows for which there is no row in  
# dts2 for a UserID in dts1. This is the case for UserID=17.
filter(is.na(time.y) | last(time.y) > time.x) %>% 

# summarise Events and transactions
summarise(Events = toString(unique(Events)), transactions = toString(unique(transactions)))

The results are:

print(result)
## A tibble: 4 x 3
##  UserID Events transactions
##             
##1      4   9, 4            2
##2      5      8         4, 3
##3     14      5            3
##4     17     10           NA

Hope this helps.

Map two data frames in a certain condition

Answers (2)

Related Questions