Reputation: 2191
I am using the library(nycflights13) and I use the following command to group_by month and day, select the top 3 rows within each group and then sort in descending order within each group by departure delay. The code is the following:
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>% arrange(desc(dep_delay))
Which returns the following output:
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr>
1 2013 1 9 641 900 1301 1242 1530 1272 HA 51 N384HA JFK HNL
2 2013 6 15 1432 1935 1137 1607 2120 1127 MQ 3535 N504MQ JFK CMH
3 2013 1 10 1121 1635 1126 1239 1810 1109 MQ 3695 N517MQ EWR ORD
4 2013 9 20 1139 1845 1014 1457 2210 1007 AA 177 N338AA JFK SFO
5 2013 7 22 845 1600 1005 1044 1815 989 MQ 3075 N665MQ JFK CVG
6 2013 4 10 1100 1900 960 1342 2211 931 DL 2391 N959DL JFK TPA
The records are sorted in descending order but not within groups.
Why is that? What should be done to correct the code? Your advice will be appreciated.
#Edit
#Following the suggestions made in the comments I still don't get what I am looking for, i.e. within each month-day grouping sorting of the top 3 records in descending order in terms of the departure delay:
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>% arrange(desc(month, day, dep_delay))
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr>
1 2013 12 1 657 1930 687 1010 2249 681 DL 1091 N342NW JFK SAT
2 2013 12 1 1504 1056 248 1628 1230 238 EV 5309 N615QX LGA BGR
3 2013 12 1 2017 1455 322 2146 1609 337 DL 1164 N6704Z JFK BOS
4 2013 12 2 1139 745 234 1358 1012 226 DL 807 N935AT EWR ATL
5 2013 12 2 1823 1345 278 2123 1640 283 UA 1510 N75861 EWR IAH
6 2013 12 2 1842 1428 254 1955 1545 250 EV 5712 N827AS JFK IAD
7 2013 12 3 920 600 200 1158 846 192 B6 583 N535JB JFK MCO
8 2013 12 3 1424 1114 190 1713 1347 206 UA 405 N437UA LGA DEN
9 2013 12 3 2300 1935 205 116 2203 193 FL 1346 N964AT LGA ATL
10 2013 12 4 1210 829 221 1440 1055 225 EV 4419 N23139 EWR XNA
Upvotes: 21
Views: 39147
Reputation: 2253
Just add the .by_group = TRUE
argument to arrange
flights %>%
group_by(month, day) %>%
top_n(3, dep_delay) %>%
arrange(dep_delay, .by_group = TRUE)
Upvotes: 33
Reputation: 1668
I think the problem in your second example is that your are using desc
on all the variables at the same time, so it is only applied to the month
column.
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>%
arrange(
month,
day,
desc(dep_delay)
)
Source: local data frame [1,108 x 19]
Groups: month, day [365]
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr>
1 2013 1 1 848 1835 853 1001 1950 851 MQ 3944 N942MQ JFK
2 2013 1 1 2343 1724 379 314 1938 456 EV 4321 N21197 EWR
3 2013 1 1 1815 1325 290 2120 1542 338 EV 4417 N17185 EWR
4 2013 1 2 2131 1512 379 2340 1741 359 UA 488 N593UA LGA
5 2013 1 2 1607 1030 337 2003 1355 368 AA 179 N324AA JFK
6 2013 1 2 1412 838 334 1710 1147 323 UA 468 N474UA EWR
7 2013 1 3 2056 1605 291 2239 1754 285 9E 3459 N928XJ JFK
8 2013 1 3 2008 1540 268 2339 1909 270 DL 2027 N338NW JFK
9 2013 1 3 2012 1600 252 2314 1857 257 B6 369 N558JB LGA
10 2013 1 4 2123 1635 288 2332 1856 276 EV 3805 N29917 EWR
# ... with 1,098 more rows, and 6 more variables: dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
# time_hour <dttm>
Upvotes: 15