Reputation: 3722
I have a table of orders, and a table of activity on accounts. I'm looking to find the most recent activity on the account with that order. I would like to iterate through each order to find the activity that matches the account, and the nearest date to the order date.
rec type date account nearest.rec
1 Order 12/1/2016 A
2 Order 11/14/2016 B
3 Activity 11/13/2016 A
4 Activity 10/15/2016 C
5 Order 11/13/2016 C
6 Activity 11/16/2016 A
7 Activity 11/17/2016 A
8 Activity 10/14/2016 B
9 Activity 11/4/2016 B
Would like to turn that into this:
rec type date account nearest.rec.actv
1 Order 12/1/2016 A 7
2 Order 11/14/2016 B 9
3 Activity 11/13/2016 A
4 Activity 10/15/2016 C
5 Order 11/13/2016 C 4
6 Activity 11/16/2016 A
7 Activity 11/17/2016 A
8 Activity 10/14/2016 B
9 Activity 11/4/2016 B
Or turning into its own data frame
rec type date account nearest.rec.actv actv.date
1 Order 12/1/2016 A 7 11/17/2016
2 Order 11/14/2016 B 9 11/4/2016
5 Order 11/13/2016 C 4 10/15/2016
Upvotes: 0
Views: 103
Reputation: 640
Here's my solution using dplyr
and purrr
:
library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
# Convert date column to Date format with lubridate function
df <- df %>% mutate_at("date", mdy)
# Get the date assigned to order type
list_order <- df %>%
filter(type == "Order") %>%
select(date, account) %>%
spread(account, date) %>%
as.list()
# Get the data frame filtered with Order type
df_order <- filter(df, type == "Order")
# For each account, check the closest date to the date stored in the list
map_df(seq_along(df_order), function(i) {
df %>%
filter(type == "Activity", account == names(list_order[i])) %>%
filter(date - list_order[[i]] == max(as.numeric(date - list_order[[i]])))
}) %>%
select(nearest.rec.actv = rec, actv.date = date, account) %>%
inner_join(df_order) %>%
select(rec, type, date, nearest.rec.actv, actv.date)
Upvotes: 0
Reputation: 8413
this is no where a efficient answer, but stepwise execution might help:
# subset into 2 dataframes
df1 <- df[df$type == "Order",]
df2 <- df[df$type == "Activity",]
# basic logic in the mutate() is that get the time difference for each record in a account. find the minimum, and get the corresponding activity date and record
df1 %>% group_by(account) %>%
mutate(x = df2$date[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))],
y = df2$rec[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))])
# rec type date account x y
# <int> <chr> <date> <chr> <date> <int>
#1 1 Order 2016-12-01 A 2016-11-17 7
#2 2 Order 2016-11-14 B 2016-11-04 9
#3 5 Order 2016-11-13 C 2016-10-15 4
Upvotes: 0
Reputation: 4473
Split the data by type
and merge by account
, then summarize
df$date <- as.Date(df$date, "%m/%d/%Y")
ind <- df$type=="Order"
df1 <- df[ind,]
df2 <- df[!ind,]
left_join(df1, df2, by="account") %>%
group_by(account) %>%
filter( date.x - date.y == min(date.x-date.y))
# rec.x type.x date.x account rec.y type.y date.y
# <int> <chr> <date> <chr> <int> <chr> <date>
#1 1 Order 2016-12-01 A 7 Activity 2016-11-17
#2 2 Order 2016-11-14 B 9 Activity 2016-11-04
#3 5 Order 2016-11-13 C 4 Activity 2016-10-15
Upvotes: 1