MFR
MFR

Reputation: 2077

Map two data-frames in R in condition that the time of one of them is earlier than the other

I wish to merge two data sets by their user ID. My problem is I had to filter those of a data set that happened after the other. A simple example is

# Dataset 1 (dts1)

  User ID       date   Hour      transactions     
1     5    25/07/2016  02:32      4         
2     6    24/07/2016  02:42      2       
3     8    25/07/2016  02:52      3         
4     9    24/07/2016  03:02      4         
5    11    25/07/2016  03:12      1        
6    13    26/07/2016  03:22      3         

and

 # Dataset 2 (dts2)

   User ID   date   Hour     Events    
1     5    25/07/2016  02:31      8         
2     5    26/07/2016  02:42      6      
3     5    24/07/2016  07:52      9         
4    14    24/07/2016  03:02      5         
5     5    25/07/2016  09:12      10        
6     4    26/07/2016  03:22      4    

I wish only map those from data set2 which happened before dataset1. So, ideally my output will be like that

#output 
    User ID   Events   Events    transactions    
1     5         8        9            4

Upvotes: 2

Views: 83

Answers (2)

dimitris_ps
dimitris_ps

Reputation: 5951

An alternative approach with dplyr and lubridate

# install.packages("dplyr")
# install.packages("lubridate")

library(dplyr)
library(lubridate)

# join the two data.frames by Used_ID
left_join(dts1, dts2, by="User_ID") %>% 

# apply the filtering condition. dts1 must be after dts2
  filter(dmy_hm(paste(date.x, Hour.x)) > 
         dmy_hm(paste(date.y, Hour.y))) %>% 

# Collapse the Events by user and transaction
  group_by(User_ID, transactions) %>% summarise(Events = toString(Events))

Upvotes: 1

Weihuang Wong
Weihuang Wong

Reputation: 13138

Given data dts1 and dts2, and assuming date and Hour are characters:

> dts1
  UserID       date  Hour transactions
1      5 25/07/2016 02:32            4
2      6 24/07/2016 02:42            2
3      8 25/07/2016 02:52            3
4      9 24/07/2016 03:02            4
5     11 25/07/2016 03:12            1
6     13 26/07/2016 03:22            3
> dts2
  UserID       date  Hour Events
1      5 25/07/2016 02:31      8
2      5 26/07/2016 02:42      6
3      5 24/07/2016 07:52      9
4     14 24/07/2016 03:02      5
5      5 25/07/2016 09:12     10
6      4 26/07/2016 03:22      4

The basic idea is to make the times in the two dataframes comparable. First we convert the date/hour in dts2 into POSIX class:

dts2$time <- strptime(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M")

Then we use apply to iterate through dts1, finding rows from dts2 that match UserID and with the condition that the time is earlier than the time in dataset 1:

dts1$Events <- apply(dts1[,c("UserID","date","Hour")], MAR=1, function(x) {
    time1 <- strptime(paste(x[2], x[3]), format="%d/%m/%Y %H:%M")
    rows <- which(dts2$UserID==as.numeric(x[1]) & dts2$time<time1)
    if (length(rows)>0) {            
        dts2$Events[rows]
    } else {
        NA
    }
})

Result:

> dts1
  UserID       date  Hour transactions Events
1      5 25/07/2016 02:32            4   8, 9
2      6 24/07/2016 02:42            2     NA
3      8 25/07/2016 02:52            3     NA
4      9 24/07/2016 03:02            4     NA
5     11 25/07/2016 03:12            1     NA
6     13 26/07/2016 03:22            3     NA

Upvotes: 0

Related Questions