Reputation: 4160
My second question of the day and my last attempt to use R to clean up this data. Here's the sitrep:
I have a data frame that has a column which is a POSIXlt date type. I want to extract the day, month and year from that column and create 3 new columns called (cleverly) day, month and year.
The data frame looks like this:
order_id dd_mmm_yy
1 2005-07-28
2 2007-03-04
I want to end up with this:
order_id dd_mmm_yy day month year
1 2005-07-28 28 7 2005
2 2007-03-04 4 3 2007
I've created a function to extract the day, month and year and return them in a list (or data frame, I've tried both).
extractdate = function (date) {
day = format(date, format="%d")
month = format(date, format="%m")
year = format(date, format="%Y")
list(day=day, month=month, year=year)
}
Here's what I've tried based on an earlier problem and question:
cbind(orders, t(sapply(orders$dd_mmm_yy, extractdate)))
which gives me this:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 5, 9
The t(sapply... by itself gives me this for some crazy reason:
day month year
sec Character,5 Character,5 Character,5
min Character,5 Character,5 Character,5
hour Character,5 Character,5 Character,5
mday Character,5 Character,5 Character,5
mon Character,5 Character,5 Character,5
year Character,5 Character,5 Character,5
wday Character,5 Character,5 Character,5
yday Character,5 Character,5 Character,5
isdst Character,5 Character,5 Character,5
What on earth is going on? Am I better off using something like Python or Java to do all the data manipulation I need to do on this data before bringing it into R for analysis?
Upvotes: 20
Views: 34109
Reputation: 1945
If you are using the data.table
package, it already has functions to extract data time components from POSIXct.
second(x)
minute(x)
hour(x)
yday(x)
wday(x)
mday(x)
week(x)
isoweek(x)
month(x)
quarter(x)
year(x)
The use is straightforward (e.g. dt[, day := day(dd_mmm_yy)]
). You can see the full documentation here.
Upvotes: 1
Reputation: 55695
One liner using lubridate
require(plyr); require(lubridate)
mutate(mydf, date = ymd(dd_mmm_yy), day = day(date),
month = month(date), year = year(date))
order_id dd_mmm_yy date day month year
1 1 2005-07-28 2005-07-28 28 7 2005
2 2 2007-03-04 2007-03-04 4 3 2007
Upvotes: 10
Reputation: 176638
POSIXlt
objects are a list of 9 components (see the Details section of ?POSIXlt
for more information). Because the dd_mmm_yy
column is POSIXlt
, you don't need a function to extract the components. You can just extract the components by their names:
orders$day <- orders$dd_mmm_yy$mday # day of month
orders$month <- orders$dd_mmm_yy$mon+1 # month of year (zero-indexed)
orders$year <- orders$dd_mmm_yy$year+1900 # years since 1900
orders
# order_id dd_mmm_yy day month year
# 1 1 2005-07-28 28 7 2005
# 2 2 2007-03-04 4 3 2007
Upvotes: 28
Reputation: 16981
Try this (DF as your data.frame):
extractdate <- function(date) {
day <- format(date, format="%d")
month <- format(date, format="%m")
year <- format(date, format="%Y")
cbind(day, month, year)
}
cbind(DF, extractdate(DF$dd_mmm_yy))
Upvotes: 4