Y.Emily
Y.Emily

Reputation: 23

How to get month/year/day of the week from different date formats in R

I have a data frame that contains a column call "date". However the date formats are distinctively different. Data type is string. I am trying to create "month" "year" and "day of the week" columns from this data column.

dataid     date
1         Tue 11/3
2         Wed 11/4 
3          N/A
4         Monday, February 1, 2016
5         Thursday, March 25, 2015 

What is the best way to do this?

Upvotes: 2

Views: 337

Answers (2)

Paul
Paul

Reputation: 2959

If the day and month are written as characters, then regular expressions can be used within a dplyr::case_when() call:

library(dplyr)

df <- df %>%
  mutate(
    day_of_the_week = case_when(
      grepl("mon", date, ignore.case = T) ~ "mon",
      grepl("tue", date, ignore.case = T) ~ "tues",
      grepl("wed", date, ignore.case = T) ~ "wed",
      grepl("thu", date, ignore.case = T) ~ "thurs",
      grepl("fri", date, ignore.case = T) ~ "fri",
      grepl("sat", date, ignore.case = T) ~ "sat",
      grepl("sun", date, ignore.case = T) ~ "sun",
      T ~ NA_character_
    ),
    month = case_when(
      grepl("jan", date, ignore.case = T) ~ "jan",
      grepl("feb", date, ignore.case = T) ~ "feb",
      grepl("mar", date, ignore.case = T) ~ "mar",
      grepl("apr", date, ignore.case = T) ~ "apr",
      grepl("may", date, ignore.case = T) ~ "may",
      grepl("jun", date, ignore.case = T) ~ "jun",
      grepl("jul", date, ignore.case = T) ~ "jul",
      grepl("aug", date, ignore.case = T) ~ "aug",
      grepl("sep", date, ignore.case = T) ~ "sep",
      grepl("oct", date, ignore.case = T) ~ "oct",
      grepl("nov", date, ignore.case = T) ~ "nov",
      grepl("dec", date, ignore.case = T) ~ "dec",
      T ~ NA_character_
    )
  )

#   dataid                     date day_of_the_week month
# 1      1                 Tue 11/3            tues  <NA>
# 2      2                 Wed 11/4             wed  <NA>
# 3      3                     <NA>            <NA>  <NA>
# 4      4 Monday, February 1, 2016             mon   feb
# 5      5 Thursday, March 25, 2015           thurs   mar

It's a harder to pull out day/month number (you could possibly do it in a similar way for days of month between 13 and 31, but otherwise it's impossible to know if the number is for the day or month).

Data

df <- read.table(text = "
dataid     date
1         'Tue 11/3'
2         'Wed 11/4'
3         N/A
4         'Monday, February 1, 2016'
5         'Thursday, March 25, 2015'",
                 header = T,
                 stringsAsFactors = F,
                 na.strings = "N/A")

Upvotes: 0

TC Zhang
TC Zhang

Reputation: 2797

The robust way is to use lubridate::parse_date_time(), but those dates witout year may be wrongly parsed (you may need to manually edit it).

You may read "help("strptime")" to learn more about how to format orders to parse your date.

p.s. March 25, 2015 is wednesday, not Thursday as in your example data.

library(dplyr)

library(lubridate)


df <- data.table::fread(
"dataid     date
1         'Tue 11/3'
2         'Wed 11/4' 
3         'N/A'
4         'Monday, February 1, 2016'
5         'Thursday, March 25, 2015'
",quote="\'")

df.new <- df %>%
  mutate(
    date2 =lubridate::parse_date_time(x =date, orders = c("%a %m/%d", "%A, %B %d, %Y"))
  )
#> Warning: 1 failed to parse.


df.new
#>   dataid                     date      date2
#> 1      1                 Tue 11/3 2018-11-03
#> 2      2                 Wed 11/4 2018-11-04
#> 3      3                      N/A       <NA>
#> 4      4 Monday, February 1, 2016 2016-02-01
#> 5      5 Thursday, March 25, 2015 2015-03-25

Created on 2018-10-08 by the reprex package (v0.2.1)

from there you can extract year, month, day of week like this:

df.new %>%
  mutate(
    year = lubridate::year(date2),
    month = lubridate::month(date2),
    day_of_week = weekdays(date2)
  )

  #  dataid                     date      date2 year month day_of_week
  #1      1                 Tue 11/3 2018-11-03 2018    11    Saturday
  #2      2                 Wed 11/4 2018-11-04 2018    11      Sunday
  #3      3                      N/A       <NA>   NA    NA        <NA>
  #4      4 Monday, February 1, 2016 2016-02-01 2016     2      Monday
  #5      5 Thursday, March 25, 2015 2015-03-25 2015     3   Wednesday

Upvotes: 2

Related Questions