Reputation: 3
I am trying to calculate cumulative time between experiment initiation and each temperature measurement I made during an experiment. My experiment has 50+ samples (i.e. A1,A2,B1,B2 ect) and each sample was initiated at a different time/day.
I want to treat time as a continuous variable so I need to calculate the total (cumulative time in hours) for each temp measurement, as indicated by a total time variable (below).
Is there a way to do this without telling R the start and end time for each sample? How can I get cumulative time for each temperature measurement?
An example of how the data looks is shown below:
Date Time Sample Temp **Total Time**
2017-06-28 13:58 A1 13.67870139 0
2017-06-29 16:08 A1 13.34256319 (time between 06-28 13:58 and 06-29 16:08)
2017-06-30 07:32 A1 12.84005139 (time between 06-28 13:58 and 06-30 07:32)
2017-06-30 19:14 A1 12.84005139 ect
2017-07-01 07:40 A1 11.83983472
2017-07-01 19:20 A1 11.83983472
2017-07-02 07:53 A1 11.75332292
2017-07-02 19:06 A1 11.75332292
2017-07-03 07:45 A1 11.74469167
2017-07-03 19:04 A1 11.74469167
2017-07-04 08:15 A1 11.45409583
Thanks!
PS- I haven't tried anything because I can only find example where there is a "start time" and "end time" col. I guess I could create a start time col for each sample and ask r to calculate the time diff between start time and the time of each measurement.... I don't know how to make R make a start time col for the earliest measurement for each sample. I could spend a whole day doing it in excel but there must bet a better solution!
Upvotes: 0
Views: 109
Reputation: 20463
Here's one way. First, unite the Date
and Time
column. Then, convert it to a timestamp via lubridate::ymd_hm
. Next, pluck the first timestamp. Finally, create two new columns that show the difference from that start date.
library(tidyverse)
library(lubridate)
df <- df %>%
unite("datetime", c("Date", "Time"), sep = " ") %>%
mutate(datetime = ymd_hm(datetime))
start_date <- df$datetime[1]
df %>%
mutate(time_from_start_secs = datetime - start_date,
time_from_start_period = seconds_to_period(time_from_start_secs))
# datetime Sample Temp time_from_start_secs time_from_start_period
# 1 2017-06-28 13:58:00 A1 13.67870 0 secs 0S
# 2 2017-06-29 16:08:00 A1 13.34256 94200 secs 1d 2H 10M 0S
# 3 2017-06-30 07:32:00 A1 12.84005 149640 secs 1d 17H 34M 0S
# 4 2017-06-30 19:14:00 A1 12.84005 191760 secs 2d 5H 16M 0S
# 5 2017-07-01 07:40:00 A1 11.83983 236520 secs 2d 17H 42M 0S
# 6 2017-07-01 19:20:00 A1 11.83983 278520 secs 3d 5H 22M 0S
# 7 2017-07-02 07:53:00 A1 11.75332 323700 secs 3d 17H 55M 0S
# 8 2017-07-02 19:06:00 A1 11.75332 364080 secs 4d 5H 8M 0S
# 9 2017-07-03 07:45:00 A1 11.74469 409620 secs 4d 17H 47M 0S
# 10 2017-07-03 19:04:00 A1 11.74469 450360 secs 5d 5H 6M 0S
# 11 2017-07-04 08:15:00 A1 11.45410 497820 secs 5d 18H 17M 0S
Alternatively, you could use min(datetime)
. In addition, if you need to have the calculation "reset" with each sample
, you will want to use group_by
-- all in one big swoop we would have:
df %>%
unite("datetime", c("Date", "Time"), sep = " ") %>%
group_by(Sample) %>%
mutate(datetime = ymd_hm(datetime),
time_secs = datetime - min(start_date))
Upvotes: 1
Reputation: 2715
This seems a fairly simple data manipulation task. What you are essentially trying is a simple group by and time difference.
The code snippet below does what it takes ::
Step 1: Add date and time to same variable so that we can calculate differences
Step 2: Calculate the difference between current time and minimum time for that group
I updated your data to have the two groups; just to show an example.
library(dplyr) # pipes, group by and mutate
library(lubridate) # times
input_data <- read.table(text = "Date Time Sample Temp
2017-06-28 13:58 A1 13.67870139
2017-06-29 16:08 A1 13.34256319
2017-06-30 07:32 A1 12.84005139
2017-06-30 19:14 A1 12.84005139
2017-07-01 07:40 A1 11.83983472
2017-07-01 19:20 A1 11.83983472
2017-07-02 07:53 A1 11.75332292
2017-07-02 19:06 A2 11.75332292
2017-07-03 07:45 A2 11.74469167
2017-07-03 19:04 A2 11.74469167
2017-07-04 08:15 A2 11.45409583",
header = TRUE, stringsAsFactors = FALSE) %>%
mutate(Actual_Time = paste(Date, Time, sep =" ") %>% parse_date_time(., "%y-%m-%d %H:%M")) %>%
group_by(Sample) %>%
mutate(Total_Time = difftime(Actual_Time, min(Actual_Time), units = "hours"))
# A tibble: 11 x 6
# Groups: Sample [2]
Date Time Sample Temp Actual_Time Total_Time
<chr> <chr> <chr> <dbl> <dttm> <time>
1 2017-06-28 13:58 A1 13.67870 2017-06-28 13:58:00 0.00000 hours
2 2017-06-29 16:08 A1 13.34256 2017-06-29 16:08:00 26.16667 hours
3 2017-06-30 07:32 A1 12.84005 2017-06-30 07:32:00 41.56667 hours
4 2017-06-30 19:14 A1 12.84005 2017-06-30 19:14:00 53.26667 hours
5 2017-07-01 07:40 A1 11.83983 2017-07-01 07:40:00 65.70000 hours
6 2017-07-01 19:20 A1 11.83983 2017-07-01 19:20:00 77.36667 hours
7 2017-07-02 07:53 A1 11.75332 2017-07-02 07:53:00 89.91667 hours
8 2017-07-02 19:06 A2 11.75332 2017-07-02 19:06:00 0.00000 hours
9 2017-07-03 07:45 A2 11.74469 2017-07-03 07:45:00 12.65000 hours
10 2017-07-03 19:04 A2 11.74469 2017-07-03 19:04:00 23.96667 hours
11 2017-07-04 08:15 A2 11.45410 2017-07-04 08:15:00 37.15000 hours
Upvotes: 0