Reputation: 3699
I have the below dataset with date-time and the corresponding value. The time interval is every 10 mins. I need to generate new rows with 15 mins interval.
For example, for 15:40 the value is 599 and for 15:50 the value is 594, so a new row needs to be generated between the two, i.e 15:45 with average of 599 & 594 which is 596.5
I.e, I need to generate an average between 10 & 20 to get the value for say 16:15; and 40 & 50 to get the value for 16:45. The value for 00, 30 remains the same
Date...Time RA.CO2
6/15/2017 15:40 599
6/15/2017 15:50 594
6/15/2017 16:00 606
6/15/2017 16:10 594
6/15/2017 16:20 594
6/15/2017 16:30 594
6/15/2017 16:40 594
6/15/2017 16:50 594
6/16/2017 0:00 496.25
6/16/2017 0:10 500
6/16/2017 0:20 496.25
6/16/2017 0:30 496.25
6/16/2017 0:40 600
6/16/2017 0:50 650
6/16/2017 1:00 700
str(df)
'data.frame': 6092 obs. of 2 variables:
$ Date...Time: chr "6/15/2017 15:40" "6/15/2017 15:50" "6/15/2017 16:00"
"6/15/2017 16:10" ...
$ RA.CO2 : num 599 594 606 594 594 594 594 594 594 594 ...
Output
Date...Time RA.CO2
6/15/2017 15:45 596.5
6/15/2017 16:00 606
6/15/2017 16:15 594
6/15/2017 16:30 594
6/15/2017 16:45 594
6/16/2017 0:00 496.25
6/16/2017 0:15 498.125
6/16/2017 0:30 496.25
6/16/2017 0:45 625
6/16/2017 1:00 700
Upvotes: 2
Views: 162
Reputation: 270055
Here are some solutions. I have re-read the question and am assuming that new intermediate times should only be inserted before times that are 20 or 50 minutes after the hour and in both cases the immediately prior time (before inserting the intermediate time) must be 10 minutes previous. If that is not the intention of the question then it
, the vector of intermediate times, will need to be changed from what is shown.
1) zoo Merge df
with a data frame having the intermediate times it
and then run na.approx
from the zoo package on the RA
column to fill in the NA
values:
library(zoo)
it <- with(df, DT[c(FALSE, diff(DT) == 10) & as.POSIXlt(DT)$min %in% c(20, 50)] - 5 * 60)
M <- merge(df, data.frame(DT = it), all = TRUE)
transform(M, RA = na.approx(RA))
giving:
DT RA
1 2017-06-15 15:40:00 599.00
2 2017-06-15 15:45:00 596.50
3 2017-06-15 15:50:00 594.00
4 2017-06-15 16:00:00 606.00
5 2017-06-15 16:10:00 594.00
6 2017-06-15 16:15:00 594.00
7 2017-06-15 16:20:00 594.00
8 2017-06-15 16:30:00 594.00
9 2017-06-15 16:40:00 594.00
10 2017-06-15 16:45:00 594.00
11 2017-06-15 16:50:00 594.00
12 2017-06-16 00:00:00 496.25
13 2017-06-16 00:10:00 496.25
14 2017-06-16 00:15:00 496.25
15 2017-06-16 00:20:00 496.25
16 2017-06-16 00:30:00 496.25
1a) Note that if df
were converted to zoo, i.e. z <- read.zoo(df, tz = "")
, then this could be written as just this giving a zoo object result:
na.approx(merge(z, zoo(, it)))
2) approx This one uses no packages. it
is from above.
with(df, data.frame(approx(DT, RA, xout = sort(c(DT, it)))))
giving:
x y
1 2017-06-15 15:40:00 599.00
2 2017-06-15 15:45:00 596.50
3 2017-06-15 15:50:00 594.00
4 2017-06-15 16:00:00 606.00
5 2017-06-15 16:10:00 594.00
6 2017-06-15 16:15:00 594.00
7 2017-06-15 16:20:00 594.00
8 2017-06-15 16:30:00 594.00
9 2017-06-15 16:40:00 594.00
10 2017-06-15 16:45:00 594.00
11 2017-06-15 16:50:00 594.00
12 2017-06-16 00:00:00 496.25
13 2017-06-16 00:10:00 496.25
14 2017-06-16 00:15:00 496.25
15 2017-06-16 00:20:00 496.25
16 2017-06-16 00:30:00 496.25
Note: The input used for the above is:
df <- structure(list(DT = structure(c(1497555600, 1497556200, 1497556800,
1497557400, 1497558000, 1497558600, 1497559200, 1497559800, 1497585600,
1497586200, 1497586800, 1497587400), class = c("POSIXct", "POSIXt"
)), RA = c(599, 594, 606, 594, 594, 594, 594, 594, 496.25, 496.25,
496.25, 496.25)), .Names = c("DT", "RA"), row.names = c(NA, -12L
), class = "data.frame")
Update: Have revised assumption of which intermediate times to include.
Upvotes: 2
Reputation: 39174
We can use tidyr
to expand the data frame and imputeTS
to impute the missing values by linear interpolation.
library(dplyr)
library(tidyr)
library(lubridate)
library(imputeTS)
dt2 <- dt %>%
mutate(Date...Time = mdy_hm(Date...Time)) %>%
mutate(Date = as.Date(Date...Time)) %>%
group_by(Date) %>%
complete(Date...Time = seq(min(Date...Time), max(Date...Time), by = "5 min")) %>%
mutate(RA.CO2 = na.interpolation(RA.CO2)) %>%
ungroup() %>%
select(Date...Time, RA.CO2)
dt2
# A tibble: 22 x 2
Date...Time RA.CO2
<dttm> <dbl>
1 2017-06-15 15:40:00 599.0
2 2017-06-15 15:45:00 596.5
3 2017-06-15 15:50:00 594.0
4 2017-06-15 15:55:00 600.0
5 2017-06-15 16:00:00 606.0
6 2017-06-15 16:05:00 600.0
7 2017-06-15 16:10:00 594.0
8 2017-06-15 16:15:00 594.0
9 2017-06-15 16:20:00 594.0
10 2017-06-15 16:25:00 594.0
# ... with 12 more rows
My output is not entirely the same as your desired output. This is because:
6/16/2017 0:10
. Nevertheless, I think my solution provides you a possible way to achieve this task. You may need to adjust the code by yourself to fit those unclear rules.
dt <- read.table(text = "Date...Time RA.CO2
'6/15/2017 15:40' 599
'6/15/2017 15:50' 594
'6/15/2017 16:00' 606
'6/15/2017 16:10' 594
'6/15/2017 16:20' 594
'6/15/2017 16:30' 594
'6/15/2017 16:40' 594
'6/15/2017 16:50' 594
'6/16/2017 0:00' 496.25
'6/16/2017 0:10' 496.25
'6/16/2017 0:20' 496.25
'6/16/2017 0:30' 496.25",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 3
Reputation: 51582
Here is a different idea using zoo
library,
library(zoo)
df1 <- df[rep(rownames(df), each = 2),]
df1$DateTime[c(FALSE, TRUE)] <- df1$DateTime[c(FALSE, TRUE)]+5*60
df1$RA.CO2[c(FALSE, TRUE)] <- rollapply(df$RA.CO2, 2, by = 2, mean)
which gives,
DateTime RA.CO2 1 2017-06-15 15:40:00 599.00 1.1 2017-06-15 15:45:00 596.50 2 2017-06-15 15:50:00 594.00 2.1 2017-06-15 15:55:00 600.00 3 2017-06-15 16:00:00 606.00 3.1 2017-06-15 16:05:00 594.00 4 2017-06-15 16:10:00 594.00 4.1 2017-06-15 16:15:00 594.00 5 2017-06-15 16:20:00 594.00 5.1 2017-06-15 16:25:00 496.25 6 2017-06-15 16:30:00 594.00 6.1 2017-06-15 16:35:00 496.25 7 2017-06-15 16:40:00 594.00 7.1 2017-06-15 16:45:00 596.50 8 2017-06-15 16:50:00 594.00 8.1 2017-06-15 16:55:00 600.00 9 2017-06-16 00:00:00 496.25 9.1 2017-06-16 00:05:00 594.00 10 2017-06-16 00:10:00 496.25 10.1 2017-06-16 00:15:00 594.00 11 2017-06-16 00:20:00 496.25 11.1 2017-06-16 00:25:00 496.25 12 2017-06-16 00:30:00 496.25 12.1 2017-06-16 00:35:00 496.25
Upvotes: 1
Reputation: 29095
Here's a solution using dplyr:
library(dplyr)
df %>%
# calculate interpolated value between each row & next row
mutate(DT.next = lead(DT),
RA.next = lead(RA)) %>%
mutate(diff = difftime(DT.next, DT)) %>%
filter(as.numeric(diff) == 10) %>% #keep only 10 min intervals
mutate(DT.interpolate = DT + diff/2,
RA.interpolate = (RA + RA.next) / 2) %>%
# bind to original dataframe & sort by date
select(DT.interpolate, RA.interpolate) %>%
rename(DT = DT.interpolate, RA = RA.interpolate) %>%
rbind(df) %>%
arrange(DT)
DT RA
1 2017-06-15 15:40:00 599.00
2 2017-06-15 15:45:00 596.50
3 2017-06-15 15:50:00 594.00
4 2017-06-15 15:55:00 600.00
5 2017-06-15 16:00:00 606.00
6 2017-06-15 16:05:00 600.00
7 2017-06-15 16:10:00 594.00
8 2017-06-15 16:15:00 594.00
9 2017-06-15 16:20:00 594.00
10 2017-06-15 16:25:00 594.00
11 2017-06-15 16:30:00 594.00
12 2017-06-15 16:35:00 594.00
13 2017-06-15 16:40:00 594.00
14 2017-06-15 16:45:00 594.00
15 2017-06-15 16:50:00 594.00
16 2017-06-16 00:00:00 496.25
17 2017-06-16 00:05:00 496.25
18 2017-06-16 00:10:00 496.25
19 2017-06-16 00:15:00 496.25
20 2017-06-16 00:20:00 496.25
21 2017-06-16 00:25:00 496.25
22 2017-06-16 00:30:00 496.25
Dataset:
df <- data.frame(
DT = c(seq(from = as.POSIXct("2017-06-15 15:40"),
to = as.POSIXct("2017-06-15 16:50"),
by = "10 min"),
seq(from = as.POSIXct("2017-06-16 00:00"),
to = as.POSIXct("2017-06-16 00:30"),
by = "10 min")),
RA = c(599, 594, 606, rep(594, 5), rep(496.25, 4))
)
Upvotes: 1