Reputation: 934
data frame is here
time value
0 01-01-2015 00:00 72
1 01-01-2015 01:00 74
2 01-01-2015 02:00 75
3 01-01-2015 03:00 77
4 01-01-2015 06:00 72
if i pass this dataframe in Pandas it will give me 24 entries and missing hours has zero
in output(values) (this is also what i want)
syntax
resample_factor="H"
data_frame = data_frame.resample(resample_factor).mean()
first of all here are some link which was not helpful
can we do this with R ??
please suggest me how can we do that, if it is possible!!
Upvotes: 0
Views: 107
Reputation: 388797
Maybe you are looking for tidyr::complete
to complete missing hours. This creates hourly sequence of 24 hours starting from first
value of time.
library(dplyr)
df %>%
mutate(V2 = as.POSIXct(V2, format = "%d-%m-%Y %H:%M")) %>%
arrange(V2) %>%
tidyr::complete(V2 = seq(first(V2), first(V2) + 86400 - (60 * 60),by = "1 hour"),
fill = list(V1 = 0, V3 = 0))
# V2 V1 V3
# <dttm> <dbl> <dbl>
# 1 2015-01-01 00:00:00 0 72
# 2 2015-01-01 01:00:00 1 74
# 3 2015-01-01 02:00:00 2 75
# 4 2015-01-01 03:00:00 3 77
# 5 2015-01-01 04:00:00 0 0
# 6 2015-01-01 05:00:00 0 0
# 7 2015-01-01 06:00:00 4 72
# 8 2015-01-01 07:00:00 0 0
# 9 2015-01-01 08:00:00 0 0
#10 2015-01-01 09:00:00 0 0
# … with 14 more rows
If the time doesn't start at 00:00
, we can extract the date from date-time and create a sequence of 24 hours.
df %>%
mutate(V2 = as.POSIXct(V2, format = "%d-%m-%Y %H:%M", tz = "GMT")) %>%
tidyr::complete(V2 = seq(as.POSIXct(as.Date(first(V2))),by = "1 hour",
length.out = 24), fill = list(V1 = 0, V3 = 0))
data
df <- structure(list(V1 = 0:4, V2 = structure(1:5, .Label = c("01-01-201500:00",
"01-01-201501:00", "01-01-201502:00", "01-01-201503:00", "01-01-201506:00"
), class = "factor"), V3 = c(72L, 74L, 75L, 77L, 72L)), class =
"data.frame", row.names = c(NA, -5L))
Upvotes: 1
Reputation: 51582
Here is a base R idea,
dates1 <- seq(as.POSIXct(dd$V2[1], format = '%d-%m-%Y 00:00'),
as.POSIXct(dd$V2[1], format = '%d-%m-%Y 00:00') + 82800,
by = '1 hour')
merge(transform(dd, V2 = as.POSIXct(V2, format = '%d-%m-%Y %H:%M')),
data.frame(V2 = dates1),
by = 'V2', all = TRUE)
which gives,
V2 V1 V3 1 2015-01-01 00:00:00 0 72 2 2015-01-01 01:00:00 1 74 3 2015-01-01 02:00:00 2 75 4 2015-01-01 03:00:00 3 77 5 2015-01-01 04:00:00 NA NA 6 2015-01-01 05:00:00 NA NA 7 2015-01-01 06:00:00 4 72 8 2015-01-01 07:00:00 NA NA 9 2015-01-01 08:00:00 NA NA 10 2015-01-01 09:00:00 NA NA 11 2015-01-01 10:00:00 NA NA 12 2015-01-01 11:00:00 NA NA 13 2015-01-01 12:00:00 NA NA 14 2015-01-01 13:00:00 NA NA 15 2015-01-01 14:00:00 NA NA 16 2015-01-01 15:00:00 NA NA 17 2015-01-01 16:00:00 NA NA 18 2015-01-01 17:00:00 NA NA 19 2015-01-01 18:00:00 NA NA 20 2015-01-01 19:00:00 NA NA 21 2015-01-01 20:00:00 NA NA 22 2015-01-01 21:00:00 NA NA 23 2015-01-01 22:00:00 NA NA 24 2015-01-01 23:00:00 NA NA
NOTE: You can replace NA
as per usual
DATA
dput(dd)
structure(list(V1 = 0:4, V2 = c("01-01-2015 00:00", "01-01-2015 01:00",
"01-01-2015 02:00", "01-01-2015 03:00", "01-01-2015 06:00"),
V3 = c(72L, 74L, 75L, 77L, 72L)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 1