VM21
VM21

Reputation: 13

R - How to calculate in a new column the difference in seconds between the first and the remaining dates

I have the following dates and I want to calculate the difference between the first date and the other dates. e.g. The difference must be date 2- date 1, date 3 - date 1 etc, in seconds and in another column. Any help is appreciated I am new in R.

"2009-06-01 16:00:00 UTC" 
"2009-06-29 16:00:00 UTC" 
"2009-06-29 17:00:00 UTC"
"2009-06-30 16:00:00 UTC" 
"2009-06-30 17:00:00 UTC" 
"2009-06-30 18:00:00 UTC"
"2009-06-30 19:00:00 UTC" 
"2009-07-01 08:00:00 UTC" 
"2009-07-01 09:00:00 UTC"
"2009-07-01 10:00:00 UTC" 
"2009-07-01 16:00:00 UTC" 
"2009-07-01 17:00:00 UTC"
"2009-07-01 18:00:00 UTC" 
"2009-07-01 19:00:00 UTC" 
"2009-07-02 08:00:00 UTC"
"2009-07-02 09:00:00 UTC" 
"2009-07-02 10:00:00 UTC" 
"2009-07-02 16:00:00 UTC"
"2009-07-02 17:00:00 UTC" 
"2009-07-02 18:00:00 UTC" 
"2009-07-02 19:00:00 UTC"
"2009-07-04 10:00:00 UTC" 
"2009-07-04 16:00:00 UTC" 
"2009-07-04 17:00:00 UTC"
"2010-06-22 16:00:00 UTC" 
"2010-06-22 17:00:00 UTC" 
"2010-06-22 18:00:00 UTC"
"2010-08-20 16:00:00 UTC" 
"2011-06-02 16:00:00 UTC" 
"2011-06-02 17:00:00 UTC"
"2011-06-02 18:00:00 UTC" 
"2011-06-03 10:00:00 UTC" 
"2011-06-03 16:00:00 UTC"
"2011-06-03 17:00:00 UTC" 
"2011-06-03 18:00:00 UTC" 
"2011-06-03 19:00:00 UTC"

Upvotes: 1

Views: 96

Answers (2)

KenHBS
KenHBS

Reputation: 7174

First you'll want to convert your character strings to dates. Once you've done this, you can easily use difftime() to calculate time distances.

There are a number of packages that help you with this and even more ways to do so. So in addition to the answer provided using the lubridate package, here is a way to solve it in base R:

# (I'll assume your data is saved in a vector called my_dates)
my_dates <- gsub(" UTC", "", my_dates)   # removes " UTC" from all your dates (for no reason, see edit below)
my_dates <- as.POSIXlt(df$date)          # converts to date format
difftime(time1 = my_dates, time2 = my_dates[1], units = "sec")
Time differences in secs
# [1]         0  2419200  2422800  2505600  2509200  2512800  2516400  2563200  2566800  2570400  2592000  2595600
# [13]  2599200  2602800  2649600  2653200  2656800  2678400  2682000  2685600  2689200  2829600  2851200  2854800
# [25] 33350400 33354000 33357600 38448000 63158400 63162000 63165600 63223200 63244800 63248400 63252000 63255600

Note: In my initial answer, I used as.Date.character(), but this ignored the times after the dates! as.Date() also ignores the time and only focuses on the dates. POSIXlt() does the job and keeps both the times and the dates.

Edit from comment: Apparently difftime() is clever enough to recognise strings as dates and automatically gets the right format for the dates, too!:

difftime(my_dates, my_dates[1], units = "secs") 
# Time differences in secs
#  [1]        0  2419200  2422800  2505600  2509200  2512800  2516400  2563200  # 2566800  2570400  2592000  2595600
# [13]  2599200  2602800  2649600  2653200  2656800  2678400  2682000  2685600  2689200  2829600  2851200  2854800
# [25] 33350400 33354000 33357600 38448000 63158400 63162000 63165600 63223200 63244800 63248400 63252000 63255600

Upvotes: 2

emilliman5
emilliman5

Reputation: 5966

The lubridate package is your friend in this scenario:

library(lubridate)
d <- read.table(text='"2009-06-01 16:00:00 UTC" 
                "2009-06-29 16:00:00 UTC" 
                "2009-06-29 17:00:00 UTC"
                "2009-06-30 16:00:00 UTC" 
                "2009-06-30 17:00:00 UTC" 
                "2009-06-30 18:00:00 UTC"
                "2009-06-30 19:00:00 UTC" 
                "2009-07-01 08:00:00 UTC" 
                "2009-07-01 09:00:00 UTC"
                "2009-07-01 10:00:00 UTC" 
                "2009-07-01 16:00:00 UTC" 
                "2009-07-01 17:00:00 UTC"
                "2009-07-01 18:00:00 UTC" 
                "2009-07-01 19:00:00 UTC" 
                "2009-07-02 08:00:00 UTC"
                "2009-07-02 09:00:00 UTC" 
                "2009-07-02 10:00:00 UTC" 
                "2009-07-02 16:00:00 UTC"
                "2009-07-02 17:00:00 UTC" 
                "2009-07-02 18:00:00 UTC" 
                "2009-07-02 19:00:00 UTC"
                "2009-07-04 10:00:00 UTC" 
                "2009-07-04 16:00:00 UTC" 
                "2009-07-04 17:00:00 UTC"
                "2010-06-22 16:00:00 UTC" 
                "2010-06-22 17:00:00 UTC" 
                "2010-06-22 18:00:00 UTC"
                "2010-08-20 16:00:00 UTC" 
                "2011-06-02 16:00:00 UTC" 
                "2011-06-02 17:00:00 UTC"
                "2011-06-02 18:00:00 UTC" 
                "2011-06-03 10:00:00 UTC" 
                "2011-06-03 16:00:00 UTC"
                "2011-06-03 17:00:00 UTC" 
                "2011-06-03 18:00:00 UTC" 
                "2011-06-03 19:00:00 UTC"', stringsAsFactors=FALSE)

d <- ymd_hms(d[, 1])

sapply(d, function(x) x-d)

Upvotes: 1

Related Questions