Reputation: 617
I try convert date format yyyymmdd in yyyy only in R. In how to convert numeric only year in Date in R? presented a very interesting answer, as it managed to make R understand to convert an 8-digit entry (yyyymmdd) as a 4-digit year year (yyyy) in the lubricated package, this is very good for me.
in old code i used round_date()
for it:
date2<-c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name<-c('A','B','C','D','E')
df<-data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = round_date(date2,'year'))
df2
str(df2)
date2<date> name<chr> year_date <date>
2000-01-01 A 2000-01-01
2000-08-08 B 2001-01-01
2001-03-16 C 2001-01-01
2000-12-25 D 2001-01-01
2000-02-29 E 2000-01-01
But I started to have problems with my statistical analysis when discovering for example that a date 2000-08-08 was rounded up to the year 2001-01-01, instead of 2001-01-01 as I expected.
This is a very big problem for me, since information that belongs to the year 2005 has been moved to the year 2006, considering that I have more than 1400 rows in my database.
I noticed that dates after the middle of the year (after June) are rounded up to the next year, this is very bad.
How do I round a 2000-08-08 date to just 2000 instead of 2001?
Upvotes: 1
Views: 2056
Reputation: 368251
Doesn't this (simpler, also only base R) operation do what you want?
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> dd <- as.Date(date2, "%d/%m/%Y")
> yd <- format(dd, "%Y-01-01")
> dt <- as.Date(yd)
> D <- data.frame(date2=date2, date=dd, y=yd, d=dt)
> D
date2 date y d
1 01/01/2000 2000-01-01 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01 2000-01-01
>
In essence we just extract the year component from the (parsed as date) Date
object and append -01-01
.
Edit: There are also trunc()
operations for Date
and Datetime
objects. Oddly, truncation for years only works for Datetime
(see the help page for trunc.Date
for more) so this works too:
> as.Date(trunc(as.POSIXlt(dd), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
Edit 2: We can use that last step in a cleaner / simpler solution in a data.frame
with three columns for input data (as characters), parse data as a proper Date
type and the desired truncated year data — all using base R without further dependencies. Of course, if you would want to you could rewrite it via the pipe and lubridate
for the same result via slightly slower route (which only matters for "large" data).
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> pd <- as.Date(date2, "%d/%m/%Y")
> td <- as.Date(trunc(as.POSIXlt(pd), "years"))
> D <- data.frame(input = date2, parsed = pd, output = td)
> D
input parsed output
1 01/01/2000 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01
>
For a real "production" use you may not need the data.frame
and do not need to keep the intermediate result leading to a one-liner:
> as.Date(trunc(as.POSIXlt( as.Date(date2, "%d/%m/%Y") ), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
which is likely the most compact and efficient conversion you can get.
Upvotes: 6
Reputation: 226192
If you want just the year (and not the date corresponding to the first day of the year) you can use lubridate::year()
.
df %>% mutate(across(date2,dmy),
year_date=year(date2))
If you do want the first day of the year then floor_date()
will do the trick.
df %>% mutate(across(date2,dmy),
year_date=floor_date(date2,"year"))
or if you only need the truncated date you could go directly to mutate(year_date=floor_date(dmy(date2)))
In base R, year()
would be format(date2, "%Y")
, as shown in @DirkEddelbuettel's answer.
Upvotes: 5
Reputation: 5813
If you consult the round_date
help page, you will also see floor_date
:
library("lubridate")
library("dplyr")
date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name <- c('A','B','C','D','E')
df <- data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = floor_date(date2,'year'))
df2
Upvotes: 3