Reputation: 23
I have a data frame that contains a column call "date". However the date formats are distinctively different. Data type is string. I am trying to create "month" "year" and "day of the week" columns from this data column.
dataid date
1 Tue 11/3
2 Wed 11/4
3 N/A
4 Monday, February 1, 2016
5 Thursday, March 25, 2015
What is the best way to do this?
Upvotes: 2
Views: 337
Reputation: 2959
If the day and month are written as characters, then regular expressions can be used within a dplyr::case_when()
call:
library(dplyr)
df <- df %>%
mutate(
day_of_the_week = case_when(
grepl("mon", date, ignore.case = T) ~ "mon",
grepl("tue", date, ignore.case = T) ~ "tues",
grepl("wed", date, ignore.case = T) ~ "wed",
grepl("thu", date, ignore.case = T) ~ "thurs",
grepl("fri", date, ignore.case = T) ~ "fri",
grepl("sat", date, ignore.case = T) ~ "sat",
grepl("sun", date, ignore.case = T) ~ "sun",
T ~ NA_character_
),
month = case_when(
grepl("jan", date, ignore.case = T) ~ "jan",
grepl("feb", date, ignore.case = T) ~ "feb",
grepl("mar", date, ignore.case = T) ~ "mar",
grepl("apr", date, ignore.case = T) ~ "apr",
grepl("may", date, ignore.case = T) ~ "may",
grepl("jun", date, ignore.case = T) ~ "jun",
grepl("jul", date, ignore.case = T) ~ "jul",
grepl("aug", date, ignore.case = T) ~ "aug",
grepl("sep", date, ignore.case = T) ~ "sep",
grepl("oct", date, ignore.case = T) ~ "oct",
grepl("nov", date, ignore.case = T) ~ "nov",
grepl("dec", date, ignore.case = T) ~ "dec",
T ~ NA_character_
)
)
# dataid date day_of_the_week month
# 1 1 Tue 11/3 tues <NA>
# 2 2 Wed 11/4 wed <NA>
# 3 3 <NA> <NA> <NA>
# 4 4 Monday, February 1, 2016 mon feb
# 5 5 Thursday, March 25, 2015 thurs mar
It's a harder to pull out day/month number (you could possibly do it in a similar way for days of month between 13 and 31, but otherwise it's impossible to know if the number is for the day or month).
df <- read.table(text = "
dataid date
1 'Tue 11/3'
2 'Wed 11/4'
3 N/A
4 'Monday, February 1, 2016'
5 'Thursday, March 25, 2015'",
header = T,
stringsAsFactors = F,
na.strings = "N/A")
Upvotes: 0
Reputation: 2797
The robust way is to use lubridate::parse_date_time()
, but those dates witout year may be wrongly parsed (you may need to manually edit it).
You may read "help("strptime")" to learn more about how to format orders to parse your date.
p.s. March 25, 2015 is wednesday, not Thursday as in your example data.
library(dplyr)
library(lubridate)
df <- data.table::fread(
"dataid date
1 'Tue 11/3'
2 'Wed 11/4'
3 'N/A'
4 'Monday, February 1, 2016'
5 'Thursday, March 25, 2015'
",quote="\'")
df.new <- df %>%
mutate(
date2 =lubridate::parse_date_time(x =date, orders = c("%a %m/%d", "%A, %B %d, %Y"))
)
#> Warning: 1 failed to parse.
df.new
#> dataid date date2
#> 1 1 Tue 11/3 2018-11-03
#> 2 2 Wed 11/4 2018-11-04
#> 3 3 N/A <NA>
#> 4 4 Monday, February 1, 2016 2016-02-01
#> 5 5 Thursday, March 25, 2015 2015-03-25
Created on 2018-10-08 by the reprex package (v0.2.1)
from there you can extract year, month, day of week like this:
df.new %>%
mutate(
year = lubridate::year(date2),
month = lubridate::month(date2),
day_of_week = weekdays(date2)
)
# dataid date date2 year month day_of_week
#1 1 Tue 11/3 2018-11-03 2018 11 Saturday
#2 2 Wed 11/4 2018-11-04 2018 11 Sunday
#3 3 N/A <NA> NA NA <NA>
#4 4 Monday, February 1, 2016 2016-02-01 2016 2 Monday
#5 5 Thursday, March 25, 2015 2015-03-25 2015 3 Wednesday
Upvotes: 2