Reputation: 43
I have a table that is gives me the date-time that I have received data and the count of how much data was received in a thirty minute interval. My problem is some half hour blocks are missing, and I want to insert them into the column and then insert a 0 in the count column.
Here is an example of what the table looks like:
Date-Time Count
2017-07-13 17:30:00 111
2017-07-13 18:00:00 85
2017-07-13 20:00:00 127
2017-07-13 20:30:00 515
I want it to have 18:30:00 0 and so on
Not sure how to do this if anyone has an idea that would be great.
Here is what I have tried to do:
starttime <- df[1,`Date-Time`]
for (i in df){
time <- starttime + 30
new_dt$datetime <- ifelse(df[i] = time, df$datetime, time)
new_dt$count <- ifelse(df[i] = time, df$count, 0)
}
Upvotes: 3
Views: 206
Reputation: 5689
While these work, I think the best bet is to use the padr
package:
library(dplyr)
library(padr)
pad_df <- df %>%
pad(interval = '30 mins')
If you prefer 0
's to NA
's, then simply:
pad_df[is.na(pad_df)] <- 0
The padr
package also has the thicken
function in case you need to quickly and seamlessly change to a lower frequency.
Upvotes: 2
Reputation: 76585
First of all, I've changed the name of your column Date-Time
to Date.Time
.
#dput(dat)
dat <-
structure(list(Date.Time = structure(c(1499963400, 1499965200,
1499972400, 1499974200), class = c("POSIXct", "POSIXt"), tzone = ""),
Count = c(111L, 85L, 127L, 515L)), .Names = c("Date.Time",
"Count"), row.names = c(NA, -4L), class = "data.frame")
Now, the trick is to use seq.POSIXct
to create a df with just one column, then merge
the two dfs.
tmp <- data.frame(
Date.Time = seq(min(dat$Date.Time), max(dat$Date.Time), by = "30 min"))
tmp
Date.Time
1 2017-07-13 17:30:00
2 2017-07-13 18:00:00
3 2017-07-13 18:30:00
4 2017-07-13 19:00:00
5 2017-07-13 19:30:00
6 2017-07-13 20:00:00
7 2017-07-13 20:30:00
merge(dat, tmp, all.y = TRUE)
Date.Time Count
1 2017-07-13 17:30:00 111
2 2017-07-13 18:00:00 85
3 2017-07-13 18:30:00 NA
4 2017-07-13 19:00:00 NA
5 2017-07-13 19:30:00 NA
6 2017-07-13 20:00:00 127
You can then rm(tmp)
if you want.
Upvotes: 2
Reputation: 8317
First let's create some dummy data.
library(tidyverse)
library(lubridate)
time_series <- tibble(
DateTime = c(
"2017-07-13 17:30:00",
"2017-07-13 18:00:00",
"2017-07-13 20:00:00",
"2017-07-13 20:30:00"
),
Count = c(111, 85, 127, 515)
) %>%
mutate(DateTime = ymd_hms(DateTime))
Now let's figure out the smallest and largest datetimes that we have in the data.
from <- min(time_series$DateTime)
to <- max(time_series$DateTime)
Finally, let's create a sequence of dates from from
to to
at 30 minute intervals. We then join the existing data to that sequence and replace any missing values of Count
with zero.
tibble(DateTime = seq(from = from, to = to, by = 1800)) %>%
left_join(time_series) %>%
mutate(Count = ifelse(is.na(Count), 0, Count))
Upvotes: 2