Reputation: 349
packages required
'dplyr'
'nycflights13'
the tibble I am using is
q4<-flights%>%group_by(year,month,day)%>%summarise(cancelled=sum(is.na(dep_time)),avg_delay=mean(arr_delay,na.rm = T),totalflights=n())
q4<-q4%>%mutate(prop=cancelled/totalflights)
using
q4%>%ungroup()%>%count(prop)
gives me
# A tibble: 342 x 2
prop n
<dbl> <int>
1 0 7
2 0.00101 1
3 0.00102 2
4 0.00102 1
5 0.00102 1
6 0.00102 1
7 0.00103 1
8 0.00103 1
9 0.00104 1
10 0.00104 1
# ... with 332 more rows
Is there a way that I can ( without using brute force logic like for loops etc) get output in the desired form, I am looking for a one-line or two-line solution. Is there a function in dplyr that does it??
Desired Output:
# A tibble: X x Y
prop n
<dbl> <int>
1 0-0.1 45 #random numbers
2 0.1-0.2 54
3 0.2-0.3 23
Upvotes: 2
Views: 1590
Reputation: 349
I figured one out myself, which I also feel is the best.
q4%>%ungroup()%>%count(cut_width(prop,0.025))
Output:
# A tibble: 11 x 2
`cut_width(prop, 0.025)` n
<fct> <int>
1 [-0.0125,0.0125] 233
2 (0.0125,0.0375] 66
3 (0.0375,0.0625] 26
4 (0.0625,0.0875] 13
5 (0.0875,0.112] 14
6 (0.112,0.138] 4
Upvotes: 0
Reputation: 40
You can use after q4<-q4%>%mutate(prop=cancelled/totalflights)
:
q4 %>% ungroup() %>%
mutate(category = cut(prop, breaks = c(-Inf,0.1,0.2,Inf), labels = c("0-0.1","0.1-0.2", "0.2 - 0.3") %>%
count(category)
I believe it will work
Upvotes: 0
Reputation: 12084
Below, I use cut
to bin the data and then table
to count instances of each bin.
data.frame(cut(q4$prop, breaks = c(0, 0.1, 0.2, 0.3)) %>% table)
produces
# . Freq
# 1 (0,0.1] 341
# 2 (0.1,0.2] 13
# 3 (0.2,0.3] 2
Upvotes: 7