Sean G
Sean G

Reputation: 385

Function to check if identical row, date minus 1 exists

I'd need a function that checks my data frame for an otherwise identical row, but with date minus 1, and returns true if it exists. It's a large data frame, so I'd like to do it as efficiently as possible.

For example, take the following data frame:

name  |date       
Timmy |01/Jan/2016
Timmy |02/Jan/2016
Timmy |03/Jan/2016
Sally |04/Jan/2016
Johnny|13/Feb/2016
Johnny|29/Mar/2016

The function should see Timmy|02/Jan/2016, check if Timmy|01/Jan/2016 exists, and return true. The resulting data frame would look like this:

name  |date       |hasDateMinusOne
Timmy |01/Jan/2016|false
Timmy |02/Jan/2016|true
Timmy |03/Jan/2016|true
Sally |04/Jan/2016|false
Johnny|13/Feb/2016|false
Johnny|29/Mar/2016|false

This is the closest answer I've found. Although it was answered by Hadley, it's 5 years old and predates dplyr. I'm wondering if it's still the most efficient way to handle 1,000,000+ rows.

Thanks!

Sean

Upvotes: 1

Views: 61

Answers (2)

akrun
akrun

Reputation: 887521

We can do this using only base R. Convert the 'date' to 'Date' class using transform, then with ave we group by 'name' and find whether the day before is found %in% the 'date' column.

df <- transform(df, date = as.Date(date, "%d/%b/%Y"))
df$hasDateMinusOne <- with(df, !!ave(as.integer(date), name,
            FUN = function(x) (x-1) %in% x))

If efficiency matters, another option is data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), change 'date' to 'Date' class, grouped by 'name' we find whether the previous day is found %in% the 'date' column.

setDT(df)[, date := as.Date(date, '%d/%b/%Y') 
     ][, hasDateMinusOne := (date-1) %in% date, by =  name]
df
#     name       date hasDateMinusOne
#1:  Timmy 2016-01-01           FALSE
#2:  Timmy 2016-01-02            TRUE
#3:  Timmy 2016-01-03            TRUE
#4:  Sally 2016-01-04           FALSE
#5: Johnny 2016-02-13           FALSE
#6: Johnny 2016-03-29           FALSE

Upvotes: 1

alistaire
alistaire

Reputation: 43354

If you format date as a date, you can just subtract one:

library(dplyr)

df %>% group_by(name) %>% 
    mutate(date = as.Date(date, '%d/%b/%Y'), 
           hasDateMinusOne = (date - 1) %in% date)

# Source: local data frame [6 x 3]
# Groups: name [3]
# 
#     name       date hasDateMinusOne
#   (fctr)     (date)           (lgl)
# 1  Timmy 2016-01-01           FALSE
# 2  Timmy 2016-01-02            TRUE
# 3  Timmy 2016-01-03            TRUE
# 4  Sally 2016-01-04           FALSE
# 5 Johnny 2016-02-13           FALSE
# 6 Johnny 2016-03-29           FALSE

Upvotes: 2

Related Questions