Reputation: 4008
I have a df with 3 columns: day, hour, sessions.
I need to complete hours that are not present in data frame with 0, based on weekdays. Hours of days goes from 1 to 23.
For example, for Sunday there is data from 6 am, but not for < 6 am (1,2,3,4,5).
Is there a function for these? pkg?
Original data example:
day hour sessions
<fct> <chr> <dbl>
1 Sun 06 1.00
2 Sun 10 2.00
3 Sun 11 2.00
4 Sun 12 1.00
5 Sun 18 1.00
6 Sun 20 2.00
Desired output: (after 6 am, there are more hours to be added with 0, but you get the idea)
day hour sessions
<fct> <chr> <dbl>
1 Sun 01 0.00
2 Sun 02 0.00
3 Sun 03 0.00
4 Sun 04 0.00
5 Sun 05 0.00
6 Sun 06 1.00
7 Sun 10 2.00
8 Sun 11 2.00
9 Sun 12 1.00
10 Sun 18 1.00
11 Sun 20 2.00
Data:
df <- structure(list(day = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Sun", "Sat",
"Fri", "Thu", "Wed", "Tue", "Mon"), class = "factor"), hour = c("06",
"10", "11", "12", "18", "20", "21", "23", "01", "08", "09", "11",
"13", "14", "16", "20", "21", "23", "07", "08", "09", "10", "11",
"12", "14", "15", "16", "17", "18", "19", "20", "21", "23", "08",
"10", "11", "12", "13", "14", "15", "16", "18", "19", "20", "04",
"09", "10", "11", "13", "15", "16", "17", "18", "19", "06", "07",
"09", "10", "11", "12", "15", "16", "17", "18", "21", "23", "04",
"06", "09", "10", "11", "12", "14", "15", "16", "17", "19", "20",
"21", "22"), sessions = c(1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1,
2, 1, 2, 1, 2, 1, 1, 1, 2, 2, 3, 2, 3, 1, 1, 1, 2, 2, 1, 2, 2,
1, 2, 1, 1, 1, 1, 3, 1, 1, 2, 3, 1, 1, 1, 3, 1, 2, 4, 4, 3, 1,
1, 2, 1, 1, 3, 4, 2, 2, 1, 1, 3, 1, 1, 2, 2, 1, 1, 4, 1, 1, 2,
1, 1, 1, 2, 3)), .Names = c("day", "hour", "sessions"), row.names = c(NA,
-80L), totals = list(structure(list(sessions = "134"), .Names = "sessions")), minimums = list(
structure(list(sessions = "1"), .Names = "sessions")), maximums = list(
structure(list(sessions = "4"), .Names = "sessions")), rowCount = 116L, vars = "day", drop = TRUE, class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 120
Reputation: 3943
Here is a simple solution with the dplyr
package. I coerced the hour
variable to a numeric first.
library(dplyr)
allHours <- function(data) {
data.frame(hour = 1:23,
sessions = sapply(1:23, function(x) sum(data$sessions[data$hour == x])))
}
df %>%
mutate(hour = as.numeric(hour)) %>%
group_by(day) %>%
do(allHours(.))
Output:
# A tibble: 161 x 3
# Groups: day [7]
day hour sessions
<fctr> <int> <dbl>
1 Sun 1 0
2 Sun 2 0
3 Sun 3 0
4 Sun 4 0
5 Sun 5 0
6 Sun 6 1
7 Sun 7 0
8 Sun 8 0
9 Sun 9 0
10 Sun 10 2
# ... with 151 more rows
Upvotes: 2