How to averaging over a time period by hours?

Question

Im new to R and expirience my first difficulties. I have a data set of ca.10000 obs. of 365 days where I capture occurences of an event. This occurrences are marked out only for the first 14 days of each month. I would like to complement the additional 16 days by averaging over the previous occurrences of the corresponding month(by hour).

The structure is as follows:

                    day           hours      occurrence
                    2000-01-01     1          5
                    2000-01-01     2          6
                    2000-01-01     3          7
                    ...            ...        ...
                    2000-01-01     23         3
                    2000-01-01     24         2
                    ...            ...        ...
                    2000-01-02     1          4
                    2000-01-02     2          2
                    2000-01-02     3          5
                    ...            ...        ...
                    2000-01-02     23         2
                    2000-01-02     24         1
                    ...
                    ...
                    2000-01-15     1          average of the previous 1 hours((5+4+n)/2*k))
                    2000-01-15     2          average of the previous 2 hours ((6+2+n)/2*k))
                    2000-01-15     3          average of the previous 3 hours((7+5+n)/2*k))
                    ...            ...         ...
                    2000-01-15     23         average of the previous 23 hours
                    2000-01-15     24         average of the previous 24 hours
                    ...            ...         ...
                    ...            ...         ...
                    2000-01-30
                    2000-01-30
                    2000-01-30
                    2000-01-30
                    ...            ...         ...
                    ...            ...         ...
                    2000-02-01
                    2000-02-01
                    2000-02-01
                    2000-02-01
                    ...            ...         ...
                    ...
                    ...            ...         ...
                    2000-12-24

I tried the

               aggregate( occurences ~ hours, mean)

but the results were pointless and I tried

               tapply( X = occurences, INDEX = list(hours), FUN = Mean )

Unfortunately both didnt work as I imagined. I think its necessary to include the corresponding month into the function. However my means seems to be limited.

Henrik · Accepted Answer

You may try this. Please note that in order to make the example smaller, I select data only for day 1-4 and hour 0-1 each month. Day 1 & 2 in each month have data on occurrence, and day 2 & 3 are missing data for occurrence.

library(dplyr)

# create dummy data
set.seed(123) # for reproducibility of sample

d1 <- data.frame(time = seq(from = as.POSIXct("2000-01-01"), 
                            to = as.POSIXct("2000-02-28"),
                            by = "hour"))
d1 <- d1 %>%
  mutate(hour = as.integer(format(time, "%H")),
         day = as.integer(format(time, "%d")), # <~~ only needed to generate sample data
         month = as.integer(format(time, "%m")),
         occurence = sample(1:10, length(time), replace = TRUE),
         occurence = ifelse(day %in% 1:2, occurence, NA)) %>%  # <~~~ data only for day 1-2
  filter(hour %in% 0:1 & day %in% 1:4) %>%  # <~~~ smaller example: select hour 0-1, day 1-4
  select(-day)

# calculate mean occurrence per month and hour
d2 <- d1 %>%
  group_by(month, hour) %>%
  summarise(mean_occ = round(mean(occurence, na.rm = TRUE), 1))
d2
#   month hour mean_occ
# 1     1    0      5.0
# 2     1    1      8.0
# 3     2    0      5.5
# 4     2    1      6.5


# replace missing occurrence with mean_occ
d3 <- d1 %>%
  left_join(d2, by = c("hour", "month")) %>%
  mutate(occurence2 = ifelse(is.na(occurence), mean_occ, occurence)) %>%
  select(-month, -mean_occ)

d3
#    hour                time occurence occurence2
# 1     0 2000-01-01 00:00:00         3        3.0
# 2     1 2000-01-01 01:00:00         8        8.0
# 3     0 2000-01-02 00:00:00         7        7.0
# 4     1 2000-01-02 01:00:00         8        8.0
# 5     0 2000-01-03 00:00:00        NA        5.0
# 6     1 2000-01-03 01:00:00        NA        8.0
# 7     0 2000-01-04 00:00:00        NA        5.0
# 8     1 2000-01-04 01:00:00        NA        8.0
# 9     0 2000-02-01 00:00:00         4        4.0
# 10    1 2000-02-01 01:00:00         6        6.0
# 11    0 2000-02-02 00:00:00         7        7.0
# 12    1 2000-02-02 01:00:00         7        7.0
# 13    0 2000-02-03 00:00:00        NA        5.5
# 14    1 2000-02-03 01:00:00        NA        6.5
# 15    0 2000-02-04 00:00:00        NA        5.5
# 16    1 2000-02-04 01:00:00        NA        6.5

How to averaging over a time period by hours?

Answers (2)

Related Questions