Reputation: 385
I'd need a function that checks my data frame for an otherwise identical row, but with date minus 1, and returns true
if it exists. It's a large data frame, so I'd like to do it as efficiently as possible.
For example, take the following data frame:
name |date
Timmy |01/Jan/2016
Timmy |02/Jan/2016
Timmy |03/Jan/2016
Sally |04/Jan/2016
Johnny|13/Feb/2016
Johnny|29/Mar/2016
The function should see Timmy|02/Jan/2016
, check if Timmy|01/Jan/2016
exists, and return true
. The resulting data frame would look like this:
name |date |hasDateMinusOne
Timmy |01/Jan/2016|false
Timmy |02/Jan/2016|true
Timmy |03/Jan/2016|true
Sally |04/Jan/2016|false
Johnny|13/Feb/2016|false
Johnny|29/Mar/2016|false
This is the closest answer I've found. Although it was answered by Hadley, it's 5 years old and predates dplyr. I'm wondering if it's still the most efficient way to handle 1,000,000+ rows.
Thanks!
Sean
Upvotes: 1
Views: 61
Reputation: 887521
We can do this using only base R
. Convert the 'date' to 'Date' class using transform
, then with ave
we group by 'name' and find whether the day before is found %in%
the 'date' column.
df <- transform(df, date = as.Date(date, "%d/%b/%Y"))
df$hasDateMinusOne <- with(df, !!ave(as.integer(date), name,
FUN = function(x) (x-1) %in% x))
If efficiency matters, another option is data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), change 'date' to 'Date' class, grouped by 'name' we find whether the previous day is found %in%
the 'date' column.
setDT(df)[, date := as.Date(date, '%d/%b/%Y')
][, hasDateMinusOne := (date-1) %in% date, by = name]
df
# name date hasDateMinusOne
#1: Timmy 2016-01-01 FALSE
#2: Timmy 2016-01-02 TRUE
#3: Timmy 2016-01-03 TRUE
#4: Sally 2016-01-04 FALSE
#5: Johnny 2016-02-13 FALSE
#6: Johnny 2016-03-29 FALSE
Upvotes: 1
Reputation: 43354
If you format date
as a date, you can just subtract one:
library(dplyr)
df %>% group_by(name) %>%
mutate(date = as.Date(date, '%d/%b/%Y'),
hasDateMinusOne = (date - 1) %in% date)
# Source: local data frame [6 x 3]
# Groups: name [3]
#
# name date hasDateMinusOne
# (fctr) (date) (lgl)
# 1 Timmy 2016-01-01 FALSE
# 2 Timmy 2016-01-02 TRUE
# 3 Timmy 2016-01-03 TRUE
# 4 Sally 2016-01-04 FALSE
# 5 Johnny 2016-02-13 FALSE
# 6 Johnny 2016-03-29 FALSE
Upvotes: 2