OverFlow Police
OverFlow Police

Reputation: 861

group_by and count number of rows a condition is met in R

I have a data table like below:

city         year    temp
Seattle      2019    82 
Seattle      2018    10 
NYC          2010    78 
DC           2011    71 
DC           2011    10 
DC           2018    60 

I would like to group them by city and year, and create a new table off of that which indicates for example how many years, Seattle had temp between 10 and 20, how may years it had temp between 20 and 30, so on.

How can I do that?

Upvotes: 1

Views: 314

Answers (1)

acylam
acylam

Reputation: 18681

We can use cut to distribute temp into bins and summarize by city and temp_range:

library(dplyr)

df %>%
  mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
  group_by(city, temp_range) %>%
  summarize(years = n_distinct(year))

Output:

# A tibble: 6 x 3
# Groups:   city [3]
  city    temp_range years
  <fct>   <fct>      <int>
1 DC      (0,10]         1
2 DC      (50,60]        1
3 DC      (70,80]        1
4 NYC     (70,80]        1
5 Seattle (0,10]         1
6 Seattle (80,90]        1

With dplyr 0.8.0, we can also keep the empty factor levels by setting the new .drop argument to FALSE in group_by:

df %>%
  mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
  group_by(city, temp_range, .drop = FALSE) %>%
  summarize(years = n_distinct(year))

Output:

# A tibble: 30 x 3
# Groups:   city [3]
   city  temp_range years
   <fct> <fct>      <int>
 1 DC    (0,10]         1
 2 DC    (10,20]        0
 3 DC    (20,30]        0
 4 DC    (30,40]        0
 5 DC    (40,50]        0
 6 DC    (50,60]        1
 7 DC    (60,70]        0
 8 DC    (70,80]        1
 9 DC    (80,90]        0
10 DC    (90,100]       0
# ... with 20 more rows

Data:

df <- structure(list(city = structure(c(3L, 3L, 2L, 1L, 1L, 1L), .Label = c("DC", 
"NYC", "Seattle"), class = "factor"), year = c(2019L, 2018L, 
2010L, 2011L, 2011L, 2018L), temp = c(82L, 10L, 78L, 71L, 10L, 
60L)), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 1

Related Questions