Dave Kincaid
Dave Kincaid

Reputation: 4160

Extract date elements from POSIXlt and put into data frame in R

My second question of the day and my last attempt to use R to clean up this data. Here's the sitrep:

I have a data frame that has a column which is a POSIXlt date type. I want to extract the day, month and year from that column and create 3 new columns called (cleverly) day, month and year.

The data frame looks like this:

order_id      dd_mmm_yy
   1          2005-07-28
   2          2007-03-04

I want to end up with this:

order_id      dd_mmm_yy    day   month   year
   1          2005-07-28    28     7     2005
   2          2007-03-04    4      3     2007

I've created a function to extract the day, month and year and return them in a list (or data frame, I've tried both).

extractdate = function (date) {
        day = format(date, format="%d")
        month = format(date, format="%m")
        year = format(date, format="%Y")

       list(day=day, month=month, year=year)
 }

Here's what I've tried based on an earlier problem and question:

cbind(orders, t(sapply(orders$dd_mmm_yy, extractdate)))

which gives me this:

Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 5, 9

The t(sapply... by itself gives me this for some crazy reason:

      day         month       year       
sec   Character,5 Character,5 Character,5
min   Character,5 Character,5 Character,5
hour  Character,5 Character,5 Character,5
mday  Character,5 Character,5 Character,5
mon   Character,5 Character,5 Character,5
year  Character,5 Character,5 Character,5
wday  Character,5 Character,5 Character,5
yday  Character,5 Character,5 Character,5
isdst Character,5 Character,5 Character,5

What on earth is going on? Am I better off using something like Python or Java to do all the data manipulation I need to do on this data before bringing it into R for analysis?

Upvotes: 20

Views: 34109

Answers (4)

Feng Jiang
Feng Jiang

Reputation: 1945

If you are using the data.table package, it already has functions to extract data time components from POSIXct.

second(x)
minute(x)
hour(x)
yday(x)
wday(x)
mday(x)
week(x)
isoweek(x)
month(x)
quarter(x)
year(x)

The use is straightforward (e.g. dt[, day := day(dd_mmm_yy)]). You can see the full documentation here.

Upvotes: 1

Ramnath
Ramnath

Reputation: 55695

One liner using lubridate

require(plyr); require(lubridate)
mutate(mydf, date = ymd(dd_mmm_yy), day = day(date), 
  month = month(date), year = year(date))

  order_id  dd_mmm_yy       date day month year
1        1 2005-07-28 2005-07-28  28     7 2005
2        2 2007-03-04 2007-03-04   4     3 2007

Upvotes: 10

Joshua Ulrich
Joshua Ulrich

Reputation: 176638

POSIXlt objects are a list of 9 components (see the Details section of ?POSIXlt for more information). Because the dd_mmm_yy column is POSIXlt, you don't need a function to extract the components. You can just extract the components by their names:

orders$day <- orders$dd_mmm_yy$mday        # day of month
orders$month <- orders$dd_mmm_yy$mon+1     # month of year (zero-indexed)
orders$year <- orders$dd_mmm_yy$year+1900  # years since 1900
orders
#   order_id  dd_mmm_yy day month year
# 1        1 2005-07-28  28     7 2005
# 2        2 2007-03-04   4     3 2007

Upvotes: 28

jrara
jrara

Reputation: 16981

Try this (DF as your data.frame):

extractdate <- function(date) {
    day <- format(date, format="%d")
    month <- format(date, format="%m")
    year <- format(date, format="%Y")

    cbind(day, month, year)
}

cbind(DF, extractdate(DF$dd_mmm_yy))

Upvotes: 4

Related Questions