wesleysc352
wesleysc352

Reputation: 617

Rounding dates with round_date() in R

I try convert date format yyyymmdd in yyyy only in R. In how to convert numeric only year in Date in R? presented a very interesting answer, as it managed to make R understand to convert an 8-digit entry (yyyymmdd) as a 4-digit year year (yyyy) in the lubricated package, this is very good for me.

in old code i used round_date() for it:

   date2<-c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
    name<-c('A','B','C','D','E')
    
    df<-data.frame(date2,name)
    
    df2 <- df %>%
      mutate(date2 = dmy(date2)) %>%
      mutate(year_date = round_date(date2,'year'))
    
    df2
    str(df2)

date2<date> name<chr> year_date <date>
2000-01-01    A         2000-01-01      
2000-08-08    B         2001-01-01      
2001-03-16    C         2001-01-01      
2000-12-25    D         2001-01-01      
2000-02-29    E         2000-01-01  

But I started to have problems with my statistical analysis when discovering for example that a date 2000-08-08 was rounded up to the year 2001-01-01, instead of 2001-01-01 as I expected.

This is a very big problem for me, since information that belongs to the year 2005 has been moved to the year 2006, considering that I have more than 1400 rows in my database.

I noticed that dates after the middle of the year (after June) are rounded up to the next year, this is very bad.

How do I round a 2000-08-08 date to just 2000 instead of 2001?

Upvotes: 1

Views: 2056

Answers (3)

Dirk is no longer here
Dirk is no longer here

Reputation: 368251

Doesn't this (simpler, also only base R) operation do what you want?

> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> dd <- as.Date(date2, "%d/%m/%Y")
> yd <- format(dd, "%Y-01-01")
> dt <- as.Date(yd)
> D <- data.frame(date2=date2, date=dd, y=yd, d=dt)
> D
       date2       date          y          d   
1 01/01/2000 2000-01-01 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01 2000-01-01
>   

In essence we just extract the year component from the (parsed as date) Date object and append -01-01.

Edit: There are also trunc() operations for Date and Datetime objects. Oddly, truncation for years only works for Datetime (see the help page for trunc.Date for more) so this works too:

> as.Date(trunc(as.POSIXlt(dd), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
> 

Edit 2: We can use that last step in a cleaner / simpler solution in a data.frame with three columns for input data (as characters), parse data as a proper Date type and the desired truncated year data — all using base R without further dependencies. Of course, if you would want to you could rewrite it via the pipe and lubridate for the same result via slightly slower route (which only matters for "large" data).

> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> pd <- as.Date(date2, "%d/%m/%Y")
> td <- as.Date(trunc(as.POSIXlt(pd), "years"))
> D <- data.frame(input = date2, parsed = pd, output = td)
> D
       input     parsed     output
1 01/01/2000 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01
> 

For a real "production" use you may not need the data.frame and do not need to keep the intermediate result leading to a one-liner:

> as.Date(trunc(as.POSIXlt( as.Date(date2, "%d/%m/%Y") ), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
> 

which is likely the most compact and efficient conversion you can get.

Upvotes: 6

Ben Bolker
Ben Bolker

Reputation: 226192

If you want just the year (and not the date corresponding to the first day of the year) you can use lubridate::year().

df %>% mutate(across(date2,dmy),
              year_date=year(date2))

If you do want the first day of the year then floor_date() will do the trick.

df %>% mutate(across(date2,dmy),
               year_date=floor_date(date2,"year"))

or if you only need the truncated date you could go directly to mutate(year_date=floor_date(dmy(date2)))

In base R, year() would be format(date2, "%Y"), as shown in @DirkEddelbuettel's answer.

Upvotes: 5

tpetzoldt
tpetzoldt

Reputation: 5813

If you consult the round_datehelp page, you will also see floor_date:

library("lubridate")
library("dplyr")

date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name <- c('A','B','C','D','E')

df <- data.frame(date2,name)

df2 <- df %>%
  mutate(date2 = dmy(date2)) %>%
  mutate(year_date = floor_date(date2,'year'))

df2

Upvotes: 3

Related Questions