Reputation: 101
The data I'm trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I would like to have this formatted as mm-dd-yyyy or mm/dd/yyyy.
I think I might need to use grep()
, but I'm not sure how to use it to reformat all of the dates that are in the mmddyyyy format.
Upvotes: 10
Views: 53324
Reputation: 2046
Updated: Improved with @Richard Scriven's colClasses
and simpler as.Date()
suggestions
Here are two similar methods that worked for me, going from a csv containing mmddyyyy
format date, to getting it recognized by R as a date object.
Starting first with a simple file tv.csv:
Series,FirstAir
Quantico,09272015
Muppets,09222015
Once within R,
> t = read.csv('tv.csv', colClasses = 'character')
tv.csv
as a data frame named t
colClasses = 'character')
option causes all the data to be considered the character
data type (instead of being Factor
, int
types)Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : chr "Quantico" "Muppets"
$ FirstAir: chr "09272015" "09222015"
chr
The chr
or string of characters are then easily converted into a date:
> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
as.Date()
performs string to date conversion%m%d%Y
specifies how to interpret the input in t$FirstAir
. These format codes, at least on Linux, can be found with running $ man date
which brings up the manual on the date
program, where there is a list of formatting codes. For example it says %m month (01..12)
If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.
Once within R,
> t = read.csv('tv.csv')
tv.csv
as a data frame named t
Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : Factor w/ 2 levels "Muppets","Quantico": 2 1
$ FirstAir: int 9272015 9222015
>
FirstAir
variable R has imported 09272015
as int
meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.This can be done in a single command but for clarity I have broken this into two steps. First,
> t$FirstAir = sprintf("%08d", t$FirstAir)
sprintf
is a formatting function0
means pad with zeroes8
means ensure 8 characters, because mmddyyyy is total 8 charactersd
is used when the input is a number, which currently it is, recall str()
output claimed the t$FirstAir
is an int
meaning integert$FirstAir
is the variable we are both setting and using as inputCheck the result:
> str(t$FirstAir)
chr [1:2] "09272015" "09222015"
int
to a chr
type, for example 9272015
became "09272015"
Now it is a string or chr
type we can then convert, same as method 1.
> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))
We do a final check:
> str(t$FirstAir)
Date[1:2], format: "2015-09-27" "2015-09-22"
In both cases, what were original values in a text file are have now been successfully converted into R date objects.
Upvotes: 11
Reputation: 21507
Have a look at lubridate
mdy
function
require(lubridate)
a <- "10281994"
mdy(a)
gives you
[1] "1994-10-28 UTC"
of class "POSIXct" "POSIXt"
so a datetime in R. (thanks Joshua Ulrich for the correction)
You could use as.Date(mdy(a))
= 1994-10-28
to get a Object of class Date
.
There are mutations like ymd
and dmy
within lubridate
as well.
Upvotes: 6