r c
r c

Reputation: 49

Filling not observed observations

I want to make a time series with the frequency a date and time is observed. The raw data looked something like this:

dd-mm-yyyy hh:mm
28-2-2018 0:12
28-2-2018 11:16
28-2-2018 12:12
28-2-2018 13:22
28-2-2018 14:23
28-2-2018 14:14
28-2-2018 16:24

The date and time format is in the wrong way for R, so I had to adjust it:

extracted_times <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")    

I ordered the data with frequency in a table using the following code:

timeserieswithoutzeros <- table(extracted_times)    

The data looks something like this now:

2018-02-28 00:11:00 2018-02-28 01:52:00 2018-02-28 03:38:00                   
                  1                   2                   5 
2018-02-28 04:10:00 2018-02-28 04:40:00 2018-02-28 04:45:00                  
                  2                   1                   1    

As you may see there are a lot of unobserved dates and times. I want to add these unobserved dates and times with the frequency of 0. I tried the complete function, but the error states that it can't best used, because I use as.POSIXct(). Any ideas?

Upvotes: 1

Views: 108

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

Maybe the following is what you want.
First, coerce the data to class "POSIXt" and create the sequence of all date/time between min and max by steps of 1 minute.

bedrijf.CSV$viewed_at <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
new <- seq(min(bedrijf.CSV$viewed_at), 
           max(bedrijf.CSV$viewed_at), 
           by = "1 mins")
tmp <- data.frame(viewed_at = new)

Now see if these values are in the original data.

tmp$viewed <- tmp$viewed_at %in% bedrijf.CSV$viewed_at
tbl <- xtabs(viewed ~ viewed_at, tmp)

sum(tbl != 0)
#[1] 7

Final clean up.

rm(new, tmp)

Upvotes: 0

GKi
GKi

Reputation: 39657

As already mentinoned in the comments by @eric-lecoutre, you can combine your observations with a sequence begining at the earliest ending at the last date using seq and subtract 1 of the frequency table.

timeseriesWithzeros <- table(c(extracted_times, seq(min(extracted_times), max(extracted_times), "1 min")))-1

Upvotes: 1

Related Questions