Reputation: 861
I have a data table like below:
city year temp
Seattle 2019 82
Seattle 2018 10
NYC 2010 78
DC 2011 71
DC 2011 10
DC 2018 60
I would like to group them by city
and year
, and create a new table off of that
which indicates for example how many years, Seattle had temp between 10 and 20, how may years it had temp between 20 and 30, so on.
How can I do that?
Upvotes: 1
Views: 314
Reputation: 18681
We can use cut
to distribute temp
into bins and summarize by city
and temp_range
:
library(dplyr)
df %>%
mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
group_by(city, temp_range) %>%
summarize(years = n_distinct(year))
Output:
# A tibble: 6 x 3
# Groups: city [3]
city temp_range years
<fct> <fct> <int>
1 DC (0,10] 1
2 DC (50,60] 1
3 DC (70,80] 1
4 NYC (70,80] 1
5 Seattle (0,10] 1
6 Seattle (80,90] 1
With dplyr 0.8.0
, we can also keep the empty factor levels by setting the new .drop
argument to FALSE
in group_by
:
df %>%
mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
group_by(city, temp_range, .drop = FALSE) %>%
summarize(years = n_distinct(year))
Output:
# A tibble: 30 x 3
# Groups: city [3]
city temp_range years
<fct> <fct> <int>
1 DC (0,10] 1
2 DC (10,20] 0
3 DC (20,30] 0
4 DC (30,40] 0
5 DC (40,50] 0
6 DC (50,60] 1
7 DC (60,70] 0
8 DC (70,80] 1
9 DC (80,90] 0
10 DC (90,100] 0
# ... with 20 more rows
Data:
df <- structure(list(city = structure(c(3L, 3L, 2L, 1L, 1L, 1L), .Label = c("DC",
"NYC", "Seattle"), class = "factor"), year = c(2019L, 2018L,
2010L, 2011L, 2011L, 2018L), temp = c(82L, 10L, 78L, 71L, 10L,
60L)), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 1