Jayajit
Jayajit

Reputation: 25

Date standardisation in a dataframe in R

In my dataframe, I have dates of the form

YYYY-MM-DD
YYYY.MM.DD
YYYY-MM-DD HH:MM
I want to standardise it into the form:
YYYY-MM-DD
in R.

I've tried the parse_date_time() function in R, but not all the columns are parsed. Why is that so? Any help would be appreciated :-)

Edit
An example for the usage of

parse_date_time(emails$mail_sent_date)

is this

[1] NA               "2010-04-17 UTC" "2012-01-26 UTC" "2014-11-15 UTC" "2014-07-17 UTC" "2010-02-22 UTC" "2010-07-17 UTC" "2012-10-27 UTC" "2014-01-18 UTC"
 [10] "2010-01-11 UTC" NA               "2010-11-28 UTC" "2012-09-24 UTC" "2014-05-30 UTC" "2014-05-30 UTC" "2010-07-31 UTC" "2007-07-28 UTC" NA              
 [19] "2014-08-29 UTC" "2015-06-05 UTC" "2008-11-03 UTC" "2018-03-18 UTC" "2019-01-12 UTC" "2011-07-23 UTC" NA               "2007-11-19 UTC" "2019-04-07 UTC"
 [28] "2010-11-28 UTC" "2019-11-22 UTC" "2019-03-28 UTC" "2013-06-22 UTC" "2013-12-08 UTC" "2012-06-08 UTC" "2011-12-09 UTC" "2017-10-23 UTC" "2017-03-26 UTC"
 [37] "2019-01-31 UTC" "2020-03-14 UTC" "2014-05-30 UTC" "2011-12-31 UTC" "2015-05-14 UTC" "2010-03-27 UTC" "2014-12-08 UTC" "2015-05-24 UTC" "2014-11-15 UTC"
 [46] NA               "2018-05-26 UTC" "2019-02-28 UTC" NA               "2015-06-11 UTC" "2012-06-09 UTC" "2013-06-16 UTC" NA               "2014-07-12 UTC"
 [55] "2012-09-20 UTC" "2010-05-22 UTC" "2019-11-07 UTC" "2011-03-07 UTC" "2007-10-05 UTC" "2018-03-17 UTC" "2007-06-22 UTC" "2007-02-01 UTC" "2020-03-29 UTC"
 [64] "2010-03-21 UTC" "2019-02-28 UTC" NA               "2008-03-17 UTC" "2013-03-14 UTC" "2014-05-12 UTC" "2015-12-19 UTC" "2010-04-05 UTC" NA              
 [73] "2008-02-07 UTC" "2007-08-12 UTC" "2011-12-02 UTC" "2014-02-02 UTC" "2011-07-25 UTC" "2014-06-12 UTC" NA               NA               "2013-10-06 UTC"
 [82] "2019-05-18 UTC" "2011-12-19 UTC" NA               "2012-03-18 UTC" "2013-07-22 UTC" "2017-01-21 UTC" "2013-09-26 UTC" "2019-04-18 UTC" "2012-10-01 UTC"
 [91] "2018-09-01 UTC" "2019-11-22 UTC" "2013-07-05 UTC" "2013-07-22 UTC" "2008-10-11 UTC" "2018-04-29 UTC" NA               "2019-06-24 UTC" "2018-04-19 UTC"
[100] "2015-08-21 UTC" NA               NA               "2015-04-09 UTC" "2012-02-11 UTC" "2011-11-13 UTC" "2013-04-11 UTC" "2007-10-07 UTC" "2007-10-08 UTC"
[109] "2012-01-14 UTC" "2012-06-02 UTC" "2011-07-04 UTC" "2019-05-17 UTC" "2012-09-09 UTC" NA               "2018-09-29 UTC" "2015-06-04 UTC" "2014-01-13 UTC"
[118] "2014-01-13 UTC" "2012-09-24 UTC" "2018-05-28 UTC" "2018-07-21 UTC" "2010-04-26 UTC" "2011-02-20 UTC" "2013-06-21 UTC" "2008-12-14 UTC" "2011-04-25 UTC"
[127] "2014-07-31 UTC" "2015-06-08 UTC" "2015-10-25 UTC" "2019-06-29 UTC" "2011-02-21 UTC" "2017-01-09 UTC" NA               "2015-06-21 UTC" "2014-07-28 UTC"
[136] "2013-11-04 UTC" "2014-07-24 UTC" NA               "2019-09-13 UTC" "2007-06-09 UTC" "2014-12-13 UTC" "2015-10-16 UTC" "2010-06-19 UTC" "2015-05-14 UTC"
[145] "2011-07-29 UTC" "2007-10-01 UTC" NA               NA               "2010-09-25 UTC" "2010-04-15 UTC" "2020-03-05 UTC" "2017-06-30 UTC" NA              
[154] "2019-06-10 UTC" "2018-10-04 UTC" "2015-05-11 UTC" "2010-05-22 UTC" "2014-07-26 UTC" "2015-01-25 UTC" "2015-07-04 UTC" "2015-07-04 UTC" "2014-07-17 UTC"
[163] "2010-09-18 UTC" "2007-01-08 UTC" "2019-10-21 UTC" "2014-06-30 UTC" "2008-08-01 UTC" NA               "2010-08-13 UTC" NA               NA              
[172] "2012-11-24 UTC" "2014-11-20 UTC" "2018-05-14 UTC" "2015-10-05 UTC" "2020-01-26 UTC" "2018-04-21 UTC" "2011-07-04 UTC" "2015-02-22 UTC" "2015-02-22 UTC"
[181] "2008-10-11 UTC" "2017-01-05 UTC" "2011-05-21 UTC" NA               "2015-09-27 UTC" "2011-08-28 UTC" "2019-03-09 UTC" "2018-11-29 UTC" "2014-07-11 UTC"
[190] "2013-06-14 UTC" "2018-06-04 UTC" "2014-11-03 UTC" "2019-03-01 UTC" "2007-10-12 UTC" "2018-01-06 UTC" NA               "2010-11-28 UTC" "2017-10-23 UTC"
[199] "2014-03-23 UTC" "2018-11-11 UTC" "2019-05-18 UTC" "2014-10-02 UTC" NA               NA               "2011-07-31 UTC" "2010-07-16 UTC" "2015-04-09 UTC"
[208] "2015-10-01 UTC" "2015-10-09 UTC" "2011-04-01 UTC" "2018-11-11 UTC" "2018-11-11 UTC" "2011-08-28 UTC" "2018-07-21 UTC" NA               "2011-02-21 UTC"
[217] "2018-03-17 UTC" NA               "2014-05-11 UTC" "2012-03-23 UTC" "2014-05-25 UTC" "2014-03-23 UTC" "2013-01-20 UTC" NA               "2014-07-11 UTC"
[226] "2014-09-08 UTC" "2013-05-24 UTC" NA               "2010-07-17 UTC" NA               "2019-01-01 UTC" NA               "2013-06-15 UTC" "2019-01-19 UTC"
[235] "2020-02-02 UTC" "2013-03-14 UTC" "2012-08-04 UTC" "2015-02-13 UTC" "2010-06-18 UTC" NA               "2013-10-20 UTC" "2015-12-17 UTC" "2017-09-01 UTC"
[244] "2013-03-28 UTC" "2010-04-01 UTC" "2017-07-24 UTC" "2007-09-30 UTC" "2017-05-27 UTC" NA               "2006-11-17 UTC" "2007-11-18 UTC" "2019-12-01 UTC"
[253] "2015-10-12 UTC" "2015-03-27 UTC" "2017-12-02 UTC" "2018-09-03 UTC" "2018-03-04 UTC" "2015-03-14 UTC" NA               "2010-01-25 UTC" "2008-07-04 UTC"
[262] "2015-04-29 UTC" "2013-04-05 UTC" NA               "2007-11-02 UTC" "2010-06-13 UTC" "2019-02-16 UTC" "2015-04-09 UTC" "2013-07-27 UTC" NA              
[271] "2018-08-25 UTC" "2019-06-14 UTC"
Warning message:
 39 failed to parse. 

A similar warning message returned when I used ymd()

Upvotes: 0

Views: 59

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269870

1) Assuming that the formats are precisely the ones shown in the question (if not please fix the question) then this uses only base R. This makes use of the fact that as.Date will ignore junk at the end.

x <- c("2000-10-01", "2000.10.01", "2000-10-01 03:04")

as.Date(chartr(".", "-", x))
## [1] "2000-10-01" "2000-10-01" "2000-10-01"

2) Another approach is the anytime package:

library(anytime)

anydate(x)
## [1] "2000-10-01" "2000-10-01" "2000-10-01"

Upvotes: 1

TarJae
TarJae

Reputation: 79164

Use lubridate package and ymd function

library(lubridate)
ymd(column_of_your_dataframe)

Upvotes: 0

Related Questions