How to replace missing value in time series data by looping?

I'm trying to create looping to replace missing time series data with value == 0.

This is my data:

df
Times                   value
05-03-2018 09:00:00      1
05-03-2018 09:01:26      2
05-03-2018 09:04:28      1
05-03-2018 09:07:05      2
05-03-2018 09:09:05      1

and my desired output is:

Times                   value
05-03-2018 09:00:00      1
05-03-2018 09:01:26      2
05-03-2018 09:02:00      0
05-03-2018 09:03:00      0
05-03-2018 09:04:28      1
05-03-2018 09:05:00      0
05-03-2018 09:06:00      0
05-03-2018 09:07:05      2
05-03-2018 09:08:00      0
05-03-2018 09:09:05      1

Missing minutes in the data should be created and assigned a value of 0.

What should I do? Create new dummies table with missing minute or make a sequence looping?

Upvotes: 2

Views: 161

Answers (3)

prosoitos
prosoitos

Reputation: 7337

library(tidyverse)
library(lubridate)
library(magrittr)

Recreate your data

df <- tibble(
  Times = c("05-03-2018 09:00:00", "05-03-2018 09:01:26",
            "05-03-2018 09:04:28", "05-03-2018 09:07:05",
            "05-03-2018 09:09:05"),
  value = c(1, 2, 1, 2, 1)
)

Code

Parse your Times variable to datetime

df$Times %<>% parse_datetime("%d-%m-%Y %H:%M:%S")

Create a new variable join that is truncated to the minute

df %<>% mutate(join = floor_date(Times, unit = "minute"))

Create a new data frame with one variable also called join and containing every minute in your range

all <- tibble(
  join = seq(as_datetime(first(df$Times), as_datetime(last(df$Times)), by = 60)
)

Join both data frames

result <- left_join(all, df)

Add the "missing minutes" to your Times variable

result$Times[is.na(result$Times)] <- result$join[is.na(result$Times)]

Replace the NA by 0

result$value[is.na(result$value)] <- 0

Remove the join variable

result %>%
  select(- join)

Result

# A tibble: 10 x 2
   Times               value
   <dttm>              <dbl>
 1 2018-03-05 09:00:00     1
 2 2018-03-05 09:01:26     2
 3 2018-03-05 09:02:00     0
 4 2018-03-05 09:03:00     0
 5 2018-03-05 09:04:28     1
 6 2018-03-05 09:05:00     0
 7 2018-03-05 09:06:00     0
 8 2018-03-05 09:07:05     2
 9 2018-03-05 09:08:00     0
10 2018-03-05 09:09:05     1

Upvotes: 1

jay.sf
jay.sf

Reputation: 72758

You could create a second 'complete' data frame and merge them together.

dif <- diff(as.numeric(range(df1$Times)))
df1 <- merge(df1, 
             data.frame(Times=as.POSIXct(0:(dif/60)*60, 
                                         origin=df1[1, 1], tz="UTC")), all=TRUE)

Then replace resulting NAs with 0.

df1[is.na(df1$value), 2] <- 0

Finally remove the duplicates.

df1 <- df1[-which(duplicated(strftime(df1$Times, format="%M"))) + 1, ]

Yields:

> df1
                 Times value
1  2018-03-05 09:00:00     1
3  2018-03-05 09:01:26     2
4  2018-03-05 09:02:00     0
5  2018-03-05 09:03:00     0
7  2018-03-05 09:04:28     1
8  2018-03-05 09:05:00     0
9  2018-03-05 09:06:00     0
11 2018-03-05 09:07:05     2
12 2018-03-05 09:08:00     0
14 2018-03-05 09:09:05     1

Data:

df1 <- structure(list(Times = structure(c(1520240400, 1520240486, 1520240668, 
1520240825, 1520240945), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    value = c(1, 2, 1, 2, 1)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 2

phiver
phiver

Reputation: 23598

You could do this with dplyr and padr packages. padr is very useful for extending datetime series between to dates or adding missing values.

library(dplyr)
library(padr)

df1 %>% 
  thicken(interval = "min") %>% # roll time series up to minutes
  pad(by = "Times_min") %>% # add missing minute intervals
  fill_by_value(value) %>% # fill missing values with 0
  mutate(Times = if_else(is.na(Times), Times_min, Times)) %>% # fill NA's in Times column
  select(-Times_min) # drop not needed column

pad applied on the interval: min
                 Times value
1  2018-03-05 09:00:00     1
2  2018-03-05 09:01:26     2
3  2018-03-05 09:02:00     0
4  2018-03-05 09:03:00     0
5  2018-03-05 09:04:28     1
6  2018-03-05 09:05:00     0
7  2018-03-05 09:06:00     0
8  2018-03-05 09:07:05     2
9  2018-03-05 09:08:00     0
10 2018-03-05 09:09:05     1

data:

df1 <- structure(list(Times = structure(c(1520240400, 1520240486, 1520240668, 
1520240825, 1520240945), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    value = c(1, 2, 1, 2, 1)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 2

Related Questions