Reputation: 1800
I am trying to understand the way group_by
function works in dplyr
. I am using the airquality
data set, that comes with the datasets
package link.
I understand that is if I do the following, it should arrange the records in increasing order of Temp
variable
airquality_max1 <- airquality %>% arrange(Temp)
I see that is the case in airquality_max1
. I now want to arrange the records by increasing order of Temp
but grouped by Month
. So the end result should first have all the records for Month == 5
in increasing order of Temp
. Then it should have all records of Month == 6
in increasing order of Temp
and so on, so I use the following command
airquality_max2 <- airquality %>% group_by(Month) %>% arrange(Temp)
However, what I find is that the results are still in increasing order of Temp
only, not grouped by Month
, i.e., airquality_max1
and airquality_max2
are equal.
I am not sure why the grouping by Month
does not happen before the arrange
function. Can anyone help me understand what I am doing wrong here?
More than the problem of trying to sort the data frame by columns, I am trying to understand the behavior of group_by
as I am trying to use this to explain the application of group_by
to someone.
Upvotes: 3
Views: 184
Reputation: 215137
arrange
ignores group_by
, see break-changes on dplyr 0.5.0. If you need to order by two columns, you can do:
airquality %>% arrange(Month, Temp)
For grouped data frame, you can also .by_group
variable to sort by the group variable first.
airquality %>% group_by(Month) %>% arrange(Temp, .by_group = TRUE)
Upvotes: 4