Reputation: 55
I'm trying to create looping to replace missing time series data with value == 0
.
This is my data:
df
Times value
05-03-2018 09:00:00 1
05-03-2018 09:01:26 2
05-03-2018 09:04:28 1
05-03-2018 09:07:05 2
05-03-2018 09:09:05 1
and my desired output is:
Times value
05-03-2018 09:00:00 1
05-03-2018 09:01:26 2
05-03-2018 09:02:00 0
05-03-2018 09:03:00 0
05-03-2018 09:04:28 1
05-03-2018 09:05:00 0
05-03-2018 09:06:00 0
05-03-2018 09:07:05 2
05-03-2018 09:08:00 0
05-03-2018 09:09:05 1
Missing minutes in the data should be created and assigned a value of 0.
What should I do? Create new dummies table with missing minute or make a sequence looping?
Upvotes: 2
Views: 161
Reputation: 7337
library(tidyverse)
library(lubridate)
library(magrittr)
df <- tibble(
Times = c("05-03-2018 09:00:00", "05-03-2018 09:01:26",
"05-03-2018 09:04:28", "05-03-2018 09:07:05",
"05-03-2018 09:09:05"),
value = c(1, 2, 1, 2, 1)
)
Parse your Times
variable to datetime
df$Times %<>% parse_datetime("%d-%m-%Y %H:%M:%S")
Create a new variable join
that is truncated to the minute
df %<>% mutate(join = floor_date(Times, unit = "minute"))
Create a new data frame with one variable also called join
and containing every minute in your range
all <- tibble(
join = seq(as_datetime(first(df$Times), as_datetime(last(df$Times)), by = 60)
)
Join both data frames
result <- left_join(all, df)
Add the "missing minutes" to your Times
variable
result$Times[is.na(result$Times)] <- result$join[is.na(result$Times)]
Replace the NA
by 0
result$value[is.na(result$value)] <- 0
Remove the join
variable
result %>%
select(- join)
# A tibble: 10 x 2
Times value
<dttm> <dbl>
1 2018-03-05 09:00:00 1
2 2018-03-05 09:01:26 2
3 2018-03-05 09:02:00 0
4 2018-03-05 09:03:00 0
5 2018-03-05 09:04:28 1
6 2018-03-05 09:05:00 0
7 2018-03-05 09:06:00 0
8 2018-03-05 09:07:05 2
9 2018-03-05 09:08:00 0
10 2018-03-05 09:09:05 1
Upvotes: 1
Reputation: 72758
You could create a second 'complete' data frame and merge them together.
dif <- diff(as.numeric(range(df1$Times)))
df1 <- merge(df1,
data.frame(Times=as.POSIXct(0:(dif/60)*60,
origin=df1[1, 1], tz="UTC")), all=TRUE)
Then replace resulting NA
s with 0
.
df1[is.na(df1$value), 2] <- 0
Finally remove the duplicates.
df1 <- df1[-which(duplicated(strftime(df1$Times, format="%M"))) + 1, ]
Yields:
> df1
Times value
1 2018-03-05 09:00:00 1
3 2018-03-05 09:01:26 2
4 2018-03-05 09:02:00 0
5 2018-03-05 09:03:00 0
7 2018-03-05 09:04:28 1
8 2018-03-05 09:05:00 0
9 2018-03-05 09:06:00 0
11 2018-03-05 09:07:05 2
12 2018-03-05 09:08:00 0
14 2018-03-05 09:09:05 1
Data:
df1 <- structure(list(Times = structure(c(1520240400, 1520240486, 1520240668,
1520240825, 1520240945), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
value = c(1, 2, 1, 2, 1)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 2
Reputation: 23598
You could do this with dplyr and padr packages. padr is very useful for extending datetime series between to dates or adding missing values.
library(dplyr)
library(padr)
df1 %>%
thicken(interval = "min") %>% # roll time series up to minutes
pad(by = "Times_min") %>% # add missing minute intervals
fill_by_value(value) %>% # fill missing values with 0
mutate(Times = if_else(is.na(Times), Times_min, Times)) %>% # fill NA's in Times column
select(-Times_min) # drop not needed column
pad applied on the interval: min
Times value
1 2018-03-05 09:00:00 1
2 2018-03-05 09:01:26 2
3 2018-03-05 09:02:00 0
4 2018-03-05 09:03:00 0
5 2018-03-05 09:04:28 1
6 2018-03-05 09:05:00 0
7 2018-03-05 09:06:00 0
8 2018-03-05 09:07:05 2
9 2018-03-05 09:08:00 0
10 2018-03-05 09:09:05 1
data:
df1 <- structure(list(Times = structure(c(1520240400, 1520240486, 1520240668,
1520240825, 1520240945), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
value = c(1, 2, 1, 2, 1)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 2