Matt W.
Matt W.

Reputation: 3722

Find most recent record by date, based on other criteria

I have a table of orders, and a table of activity on accounts. I'm looking to find the most recent activity on the account with that order. I would like to iterate through each order to find the activity that matches the account, and the nearest date to the order date.

rec   type      date         account   nearest.rec
1     Order     12/1/2016    A
2     Order     11/14/2016   B
3     Activity  11/13/2016   A
4     Activity  10/15/2016   C
5     Order     11/13/2016   C
6     Activity  11/16/2016   A
7     Activity  11/17/2016   A
8     Activity  10/14/2016   B
9     Activity  11/4/2016    B

Would like to turn that into this:

rec   type      date         account   nearest.rec.actv
1     Order     12/1/2016    A         7
2     Order     11/14/2016   B         9
3     Activity  11/13/2016   A
4     Activity  10/15/2016   C
5     Order     11/13/2016   C         4
6     Activity  11/16/2016   A
7     Activity  11/17/2016   A
8     Activity  10/14/2016   B
9     Activity  11/4/2016    B

Or turning into its own data frame

rec   type      date         account   nearest.rec.actv  actv.date
1     Order     12/1/2016    A         7                 11/17/2016
2     Order     11/14/2016   B         9                 11/4/2016
5     Order     11/13/2016   C         4                 10/15/2016

Upvotes: 0

Views: 103

Answers (3)

denrou
denrou

Reputation: 640

Here's my solution using dplyr and purrr:

library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)

# Convert date column to Date format with lubridate function
df <- df %>% mutate_at("date", mdy)

# Get the date assigned to order type
list_order <- df %>%
  filter(type == "Order") %>% 
  select(date, account) %>% 
  spread(account, date) %>% 
  as.list()

# Get the data frame filtered with Order type
df_order <- filter(df, type == "Order")

# For each account, check the closest date to the date stored in the list
map_df(seq_along(df_order), function(i) {
  df %>% 
    filter(type == "Activity", account == names(list_order[i])) %>% 
    filter(date - list_order[[i]] == max(as.numeric(date - list_order[[i]])))
}) %>% 
  select(nearest.rec.actv = rec, actv.date = date, account) %>% 
  inner_join(df_order) %>% 
  select(rec, type, date, nearest.rec.actv, actv.date)

Upvotes: 0

joel.wilson
joel.wilson

Reputation: 8413

this is no where a efficient answer, but stepwise execution might help:

# subset into 2 dataframes
df1 <- df[df$type == "Order",]
df2 <- df[df$type == "Activity",]

# basic logic in the mutate() is that get the time difference for each record in a account. find the minimum, and get the corresponding activity date and record
df1 %>% group_by(account) %>% 
  mutate(x = df2$date[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))],
         y = df2$rec[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))])

#    rec  type       date account          x     y
#  <int> <chr>     <date>   <chr>     <date> <int>
#1     1 Order 2016-12-01       A 2016-11-17     7
#2     2 Order 2016-11-14       B 2016-11-04     9
#3     5 Order 2016-11-13       C 2016-10-15     4

Upvotes: 0

ExperimenteR
ExperimenteR

Reputation: 4473

Split the data by type and merge by account, then summarize

df$date <- as.Date(df$date, "%m/%d/%Y")
ind <- df$type=="Order"
df1 <- df[ind,]
df2 <- df[!ind,]
left_join(df1, df2, by="account") %>% 
  group_by(account) %>% 
  filter( date.x - date.y == min(date.x-date.y))

#  rec.x type.x     date.x account rec.y   type.y     date.y
#  <int>  <chr>     <date>   <chr> <int>    <chr>     <date>
#1     1  Order 2016-12-01       A     7 Activity 2016-11-17
#2     2  Order 2016-11-14       B     9 Activity 2016-11-04
#3     5  Order 2016-11-13       C     4 Activity 2016-10-15

Upvotes: 1

Related Questions