Reputation: 2077
I wish to merge two data sets by their user ID. My problem is I had to filter those of a data set that happened after the other. A simple example is
# Dataset 1 (dts1)
User ID date Hour transactions
1 5 25/07/2016 02:32 4
2 6 24/07/2016 02:42 2
3 8 25/07/2016 02:52 3
4 9 24/07/2016 03:02 4
5 11 25/07/2016 03:12 1
6 13 26/07/2016 03:22 3
and
# Dataset 2 (dts2)
User ID date Hour Events
1 5 25/07/2016 02:31 8
2 5 26/07/2016 02:42 6
3 5 24/07/2016 07:52 9
4 14 24/07/2016 03:02 5
5 5 25/07/2016 09:12 10
6 4 26/07/2016 03:22 4
I wish only map those from data set2 which happened before dataset1. So, ideally my output will be like that
#output
User ID Events Events transactions
1 5 8 9 4
Upvotes: 2
Views: 83
Reputation: 5951
An alternative approach with dplyr
and lubridate
# install.packages("dplyr")
# install.packages("lubridate")
library(dplyr)
library(lubridate)
# join the two data.frames by Used_ID
left_join(dts1, dts2, by="User_ID") %>%
# apply the filtering condition. dts1 must be after dts2
filter(dmy_hm(paste(date.x, Hour.x)) >
dmy_hm(paste(date.y, Hour.y))) %>%
# Collapse the Events by user and transaction
group_by(User_ID, transactions) %>% summarise(Events = toString(Events))
Upvotes: 1
Reputation: 13138
Given data dts1
and dts2
, and assuming date
and Hour
are characters:
> dts1
UserID date Hour transactions
1 5 25/07/2016 02:32 4
2 6 24/07/2016 02:42 2
3 8 25/07/2016 02:52 3
4 9 24/07/2016 03:02 4
5 11 25/07/2016 03:12 1
6 13 26/07/2016 03:22 3
> dts2
UserID date Hour Events
1 5 25/07/2016 02:31 8
2 5 26/07/2016 02:42 6
3 5 24/07/2016 07:52 9
4 14 24/07/2016 03:02 5
5 5 25/07/2016 09:12 10
6 4 26/07/2016 03:22 4
The basic idea is to make the times in the two dataframes comparable. First we convert the date/hour in dts2
into POSIX class:
dts2$time <- strptime(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M")
Then we use apply
to iterate through dts1
, finding rows from dts2
that match UserID
and with the condition that the time is earlier than the time in dataset 1:
dts1$Events <- apply(dts1[,c("UserID","date","Hour")], MAR=1, function(x) {
time1 <- strptime(paste(x[2], x[3]), format="%d/%m/%Y %H:%M")
rows <- which(dts2$UserID==as.numeric(x[1]) & dts2$time<time1)
if (length(rows)>0) {
dts2$Events[rows]
} else {
NA
}
})
Result:
> dts1
UserID date Hour transactions Events
1 5 25/07/2016 02:32 4 8, 9
2 6 24/07/2016 02:42 2 NA
3 8 25/07/2016 02:52 3 NA
4 9 24/07/2016 03:02 4 NA
5 11 25/07/2016 03:12 1 NA
6 13 26/07/2016 03:22 3 NA
Upvotes: 0