Reputation: 21204
I have a data frame:
'data.frame': 2611029 obs. of 10 variables:
$ eid : int 28 28 28 28 28 36 36 36 36 37 ...
$ created : Factor w/ 36204 levels "0000-00-00 00:00:00",..: NA NA NA NA NA NA NA NA NA NA ...
$ class_id : int NA NA NA NA NA NA NA NA NA NA ...
$ min.e.event_time.: Factor w/ 16175 levels "2013-04-15 11:17:19",..: NA NA NA NA NA NA NA NA NA NA ...
$ lead_date : Factor w/ 11199 levels "2012-10-11 18:39:12",..: NA NA NA NA NA NA NA NA NA NA ...
$ camp : int 44698 44698 44699 44701 44701 44715 44715 44909 44909 44699 ...
$ event_date : Factor w/ 695747 levels "2008-01-18 12:18:01",..: 1 5 2 32 36 6 17039 23 24 2 ...
$ event : Factor w/ 3 levels "click","open",..: 3 2 3 3 2 3 2 3 2 3 ...
$ message_name : Factor w/ 2707 levels ""," 2015-03 CAD Promotion Update",..: 2163 2163 2163 1106 1106 2163 2163 1990 1990 2163 ...
$ subject_lin : Factor w/ 2043 levels ""," Christie Office Holiday Hours",..: 613 613 613 248 248 613 613 612 612 613 ...
Each line item is an instance of a user (eid) having received an email (event_date).
event_date, lead_date and created are all dates. Till now I have transformed these dates using as.Date() subsequent to subsetting the data so only records with complete.cases() of these dates. This allowed me to do aggregation and subsetting based conditionals e.g. where event_date < lead_date.
If I try to convert dates in data as is, without removing na values, I receive the message
Error in charToDate(x) :
character string is not in a standard unambiguous format
The purpose of the analysis is to look at the impact of receiving an email on becoming a lead (thus lead_date would be populated, NA otherwise). I therefore don't want to exclude people who never became a lead by subsetting the entire df on complete lead dates.
But I still want to perform calculations on those records with dates, leaving the NAs as their own group.
Is there anything I can do here? I want R to ignore NA results when using functions like subset or aggregation. I also want to convert all the non NA dates into dates using as.Date()
** following posting** I probably could have asked this in a much simpler way: can I convert a field in a data frame to a date where it's feasible and ignore na values otherwise?
Upvotes: 0
Views: 158
Reputation: 263331
Replace all your as.Date( )
calls with as.Date( , format="%Y-%m-%d")
> as.Date(factor("0000-00-00 00:00:00"))
Error in charToDate(x) :
character string is not in a standard unambiguous format
> as.Date(factor("0000-00-00 00:00:00"), format="%Y-%m-%d")
[1] NA
Then describe the problems (code and errors) you encounter with the updated dataset. It's not possible to predict where you are getting stuck on the next steps from the description. There is an is.na
function that cam be used in combination with other logical tests.
Do remember that is.na(NA) | NA
will return TRUE. That doesn't work with &
(AND) but will with OR.
Upvotes: 1