MathLover
MathLover

Reputation: 75

R: Writing loops to replace NULL with Dates

The Here is an example of my table:

custID | StartDate | EndDate   | ReasonForEnd  | TransactionType | TransactionDate
    1a |  NULL     | 2/12/2014 | AccountClosed |  AccountOpened  |  1/15/2004
    1a |  NULL     | 2/12/2014 | AccountClosed |  Purchase       |  3/16/2004
    .......
    2b | 7/7/2011  | 6/14/2013 | AccountClosed | AccountOpened   |  8/1/2010

The problem has to do with the StartDate field. For each custId, if the entry is NULL then I want to replace with the TransactionDate where TransactionType = AccountOpened. If StartDate is after the TransactionDate where TransactionType = AccountOpened, then replace with this date.

The actual data is over 250,000 rows. I really need some help figuring out how to write this in R.

Upvotes: 0

Views: 126

Answers (1)

talat
talat

Reputation: 70296

You could try the following, however I didn't test it yet. I assume your data.frame is called df:

require(dplyr)

df %>%
    mutate_each(funs(as.Date(as.character(., format="%m/%d/%Y"))), 
                StartDate, EndDate, TransactionDate) %>%
    group_by(custID) %>%
    mutate(StartDate = ifelse(is.na(StartDate) | StartDate > TransactionDate[TransactionType == "AccountOpened"], 
                          TransactionDate[TransactionType == "AccountOpened"], StartDate))

This code first converts several columns to Date format (in this step, NULL entries will be converted to NA), groups by custID and then checks if StartDate is either NA or greater than TransactionDate where TransactionType == "AccountOpened" and if TRUE, replaces StartDate with TransactionDate where TransactionType == "AccountOpened".

Upvotes: 1

Related Questions