Reputation: 75
The Here is an example of my table:
custID | StartDate | EndDate | ReasonForEnd | TransactionType | TransactionDate
1a | NULL | 2/12/2014 | AccountClosed | AccountOpened | 1/15/2004
1a | NULL | 2/12/2014 | AccountClosed | Purchase | 3/16/2004
.......
2b | 7/7/2011 | 6/14/2013 | AccountClosed | AccountOpened | 8/1/2010
The problem has to do with the StartDate
field. For each custId
, if the entry is NULL then I want to replace with the TransactionDate
where TransactionType = AccountOpened
. If StartDate
is after the TransactionDate
where TransactionType = AccountOpened
, then replace with this date.
The actual data is over 250,000 rows. I really need some help figuring out how to write this in R.
Upvotes: 0
Views: 126
Reputation: 70296
You could try the following, however I didn't test it yet. I assume your data.frame is called df
:
require(dplyr)
df %>%
mutate_each(funs(as.Date(as.character(., format="%m/%d/%Y"))),
StartDate, EndDate, TransactionDate) %>%
group_by(custID) %>%
mutate(StartDate = ifelse(is.na(StartDate) | StartDate > TransactionDate[TransactionType == "AccountOpened"],
TransactionDate[TransactionType == "AccountOpened"], StartDate))
This code first converts several columns to Date
format (in this step, NULL entries will be converted to NA), groups by custID
and then checks if StartDate
is either NA
or greater than TransactionDate
where TransactionType == "AccountOpened"
and if TRUE, replaces StartDate
with TransactionDate
where TransactionType == "AccountOpened"
.
Upvotes: 1