Reputation: 97
My data are like:
Date Value
00:00 10
01:00 8
02:00 1
04:00 4
...
Some data are missing if the value=0
. My question is how to fill these data back in. Like, after 02:00 17
, fill in a row 03:00 0
.
I have done some search, but only found solutions to replace NAs
with 0
. In my case, my data are not even showing in the data frame. Is there a way to check whether there's a gap between adjacent data?
Upvotes: 2
Views: 225
Reputation: 20811
Two simple ways I can think of in base r:
s <- format(seq(s <- as.POSIXct('2000-01-01'), s + 3.6e4, by = 'hour'), '%H:%M')
# [1] "00:00" "01:00" "02:00" "03:00" "04:00" "05:00" "06:00" "07:00" "08:00"
# [10] "09:00" "10:00"
ss <- s[c(1:3, 5)]
dd <- data.frame(hour = ss, value = c(10, 8, 1, 4), stringsAsFactors = FALSE)
# hour value
# 1 00:00 10
# 2 01:00 8
# 3 02:00 1
# 4 04:00 4
I made two sample vectors, s
are the times you want to fill in, and ss
are the times in your data frame that you have. presumably you already have both of these, so then you can
all = TRUE
so that there are no duplicates; then replace the NA
with 0dm <- data.frame(hour = s)
out <- merge(dm, dd, all = TRUE)
# hour value
# 1 00:00 10
# 2 01:00 8
# 3 02:00 1
# 4 03:00 NA
# 5 04:00 4
# 6 05:00 NA
# 7 06:00 NA
# 8 07:00 NA
# 9 08:00 NA
# 10 09:00 NA
# 11 10:00 NA
out[is.na(out)] <- 0
# hour value
# 1 00:00 10
# 2 01:00 8
# 3 02:00 1
# 4 03:00 0
# 5 04:00 4
# 6 05:00 0
# 7 06:00 0
# 8 07:00 0
# 9 08:00 0
# 10 09:00 0
# 11 10:00 0
## giving the times explicitly
out <- rbind(dd, data.frame(hour = sprintf('%02s:00', c(3, 5:10)), value = 0))
## or more programmatically:
out <- rbind(dd, data.frame(hour = setdiff(s, dd$hour),
value = 0))
out[order(out$hour), ]
# hour value
# 1 00:00 10
# 2 01:00 8
# 3 02:00 1
# 5 03:00 0
# 4 04:00 4
# 6 05:00 0
# 7 06:00 0
# 8 07:00 0
# 9 08:00 0
# 10 09:00 0
# 11 10:00 0
Upvotes: 2
Reputation: 31161
Here is an approach using data.table
:
library(data.table)
data = data.frame(Date=as.Date(c('2015-03-20','2015-03-24','2015-03-25','2015-03-28')),
Value=c(1,2,3,4))
# Date Value
#1 2015-03-20 1
#2 2015-03-24 2
#3 2015-03-25 3
#4 2015-03-28 4
dt = data.table(Date=seq(min(data$Date), max(data$Date), by='days'))
setkey(setDT(data), Date)[dt][!data, Value:=0][]
# Date Value
#1: 2015-03-20 1
#2: 2015-03-21 0
#3: 2015-03-22 0
#4: 2015-03-23 0
#5: 2015-03-24 2
#6: 2015-03-25 3
#7: 2015-03-26 0
#8: 2015-03-27 0
#9: 2015-03-28 4
It is basically a join on the resampling table - setkey(setDT(data), Date)[dt]
- you want (you have to define it, here this is dt
). Then you replace values not present in your original dataset with 0
's - [!data, Value:=0]
Upvotes: 3