user1960836
user1960836

Reputation: 1782

Keep desired columns when using summarise

I want to get the top 10 destinations, and also how many flights were made to these destinations. I am using summarise, and my problem is that summarise throws away all columns that a not mentioned in the summarise(..). I need to keep the column origin for later use.

library(tidyverse)
library(nycflights13)

flights %>% 
  group_by(dest) %>% 
    summarise(n = n()) %>% 
      arrange(desc(n)) %>% head(10)

Here is the result from the code above

   # A tibble: 10 x 2
   dest  allFlights
   <chr>      <int>
 1 ORD        17283
 2 ATL        17215
 3 LAX        16174
 4 BOS        15508
 5 MCO        14082
 6 CLT        14064
 7 SFO        13331
 8 FLL        12055
 9 MIA        11728
10 DCA         9705

I think this is correct. But all I am missing, is another column that prints the origin I was thinking about doing some a join to get the origin, but this doesn't make sense, as doing the join on this result set might not yield the correct flights.

I found this post: How to summarise all columns using group_by and summarise? but it was not helpful to me, as summarise is unable to find the columns I mention, that are not in its function.

Upvotes: 0

Views: 58

Answers (1)

qdread
qdread

Reputation: 3943

When you sum the flights by destination, you are summing the total number of flights arriving in the destination city, which have many different origin cities. So it would not make sense for there to be a single value in the origin column here.

If you want, you could replace group_by(dest) with group_by(origin,dest). That would give you the top 10 pairs of origin-destination cities, which is a different output than in your question, but would retain the origin and destination columns for further analysis.

library(tidyverse)
library(nycflights13)

flights %>% 
  group_by(origin, dest) %>% 
    summarise(n = n()) %>% 
      arrange(desc(n)) %>% head(10)

output

# A tibble: 10 x 3
# Groups:   origin [3]
   origin dest      n
   <chr>  <chr> <int>
 1 JFK    LAX   11262
 2 LGA    ATL   10263
 3 LGA    ORD    8857
 4 JFK    SFO    8204
 5 LGA    CLT    6168
 6 EWR    ORD    6100
 7 JFK    BOS    5898
 8 LGA    MIA    5781
 9 JFK    MCO    5464
10 EWR    BOS    5327

Upvotes: 1

Related Questions