Trying to understand dplyr function - group_by

Question

I am trying to understand the way group_by function works in dplyr. I am using the airquality data set, that comes with the datasets package link.

I understand that is if I do the following, it should arrange the records in increasing order of Temp variable

airquality_max1 <- airquality %>% arrange(Temp)

I see that is the case in airquality_max1. I now want to arrange the records by increasing order of Temp but grouped by Month. So the end result should first have all the records for Month == 5 in increasing order of Temp. Then it should have all records of Month == 6 in increasing order of Temp and so on, so I use the following command

airquality_max2 <- airquality %>% group_by(Month) %>% arrange(Temp)

However, what I find is that the results are still in increasing order of Temp only, not grouped by Month, i.e., airquality_max1 and airquality_max2 are equal.

I am not sure why the grouping by Month does not happen before the arrange function. Can anyone help me understand what I am doing wrong here?

More than the problem of trying to sort the data frame by columns, I am trying to understand the behavior of group_by as I am trying to use this to explain the application of group_by to someone.

akuiper · Accepted Answer

arrange ignores group_by, see break-changes on dplyr 0.5.0. If you need to order by two columns, you can do:

airquality %>% arrange(Month, Temp)

For grouped data frame, you can also .by_group variable to sort by the group variable first.

airquality %>% group_by(Month) %>% arrange(Temp, .by_group = TRUE)

Trying to understand dplyr function - group_by

Answers (1)

Related Questions