Zelong
Zelong

Reputation: 2556

How to order a column by group in R

I have a data.frame (say "df") looks like following:

Hospital.Name | State | Mortality.Rate
'hospital_1'   | 'AA'  | 0.2
'hospital_2'   | 'AA'   | 0.3
'hospital_3'   | 'BB'  | 0.3
'hospital_4'   | 'CC'  | 0.5

(The Hospital.Name is unique)

Now I want to order the "Mortality.Rate" group by "State", i.e. order the rate within a certain state. If there is a tie in the rate, then "Hospital.Name" is used for resolve the tie.

The "order()" and "tapply()" functions came to my mind. I coded like this:

tapply(df$Mortality.Rate, df$State, order, df$Hospital.Name, na.last=NA)

However, an error "argument length differ" popped up. When "order" function is applied to a sliced "Rate", the second argument of order (i.e. df$Hospital.Name) is not sliced.

How could I pass the second argument (for resolution a tie in ordering) to tapply() or is there any other approaches?

Upvotes: 15

Views: 60392

Answers (6)

Emeka
Emeka

Reputation: 1

assign a variable "result". and also assuming you want to find the avg mortality for each state

result <- df %<%
                 arrange(Mortality.Rate) %<%
                 order_by(State) %<%
                 summarize(mean(Mortality.Rate)
view(result)

Upvotes: 0

Michael Kaiser
Michael Kaiser

Reputation: 133

This came to my mind

 df <- df[with(df, order(State, as.numeric(Mortality.Rate), Hospital.Name)]

Check out this post How to sort a dataframe by column(s)?

Upvotes: 1

Jthorpe
Jthorpe

Reputation: 10167

In base R, you can supply multiple arguments to order() and subsequent arguments are used to break ties in the earlier variables, as in:

df[order(df$State,df$Mortality.Rate,df$Hospital.Name),]

Upvotes: 17

David Arenburg
David Arenburg

Reputation: 92282

If we already in loading needles (for this specific operation) packages, here's a package (data.table) that could be useful in a sense of sorting the data by reference (without copying it and the need of using <-) using the setorder or setkey functions

library(data.table)
setorder(setDT(df), State, Mortality.Rate, Hospital.Name)

Though, you could potentially mimic base R syntax and order the data while creating a copy (though with improved speed because data.table calls its forder under the hood)

setDT(df)[order(State, Mortality.Rate, Hospital.Name)]

Upvotes: 3

Lincoln Mullen
Lincoln Mullen

Reputation: 6455

You can do this in dplyr. First, some sample data:

library("dplyr")
hospital_name <- sample(c("hospital_1", "hospital_2", "hospital_3"), 10,
                        replace = TRUE)
state <- sample(letters[1:3], 10, replace = TRUE)
mortality_rate <- runif(10)

df <- data_frame(hospital_name, state, mortality_rate)

Group by state, then arrange by columns.

df %>% 
  group_by(state) %>% 
  arrange(mortality_rate, hospital_name)

Producing results like these, where the states are grouped and the mortality rate is sorted within each state.

## Source: local data frame [10 x 3]
## Groups: state
## 
##    hospital_name state mortality_rate
## 1     hospital_1     b     0.15293591
## 2     hospital_1     b     0.37417167
## 3     hospital_1     b     0.54561856
## 4     hospital_3     c     0.02487033
## 5     hospital_1     c     0.09937557
## 6     hospital_1     c     0.35666087
## 7     hospital_3     c     0.39663460
## 8     hospital_2     c     0.53064144
## 9     hospital_3     c     0.76015632
## 10    hospital_3     c     0.76801890

Without group_by() you just get the mortality rates from least to greatest:

df %>%
  arrange(mortality_rate)

## Source: local data frame [10 x 3]
## 
##    hospital_name state mortality_rate
## 1     hospital_3     c     0.02487033
## 2     hospital_1     c     0.09937557
## 3     hospital_1     b     0.15293591
## 4     hospital_1     c     0.35666087
## 5     hospital_1     b     0.37417167
## 6     hospital_3     c     0.39663460
## 7     hospital_2     c     0.53064144
## 8     hospital_1     b     0.54561856
## 9     hospital_3     c     0.76015632
## 10    hospital_3     c     0.76801890

Upvotes: 4

jalapic
jalapic

Reputation: 14192

you can do it in dplyr:

df %>% group_by(State) %>% arrange(Mortality.Rate, Hospital.Name) 

Upvotes: 12

Related Questions