Reputation: 2556
I have a data.frame (say "df") looks like following:
Hospital.Name | State | Mortality.Rate
'hospital_1' | 'AA' | 0.2
'hospital_2' | 'AA' | 0.3
'hospital_3' | 'BB' | 0.3
'hospital_4' | 'CC' | 0.5
(The Hospital.Name is unique)
Now I want to order the "Mortality.Rate" group by "State", i.e. order the rate within a certain state. If there is a tie in the rate, then "Hospital.Name" is used for resolve the tie.
The "order()" and "tapply()" functions came to my mind. I coded like this:
tapply(df$Mortality.Rate, df$State, order, df$Hospital.Name, na.last=NA)
However, an error "argument length differ" popped up. When "order" function is applied to a sliced "Rate", the second argument of order (i.e. df$Hospital.Name) is not sliced.
How could I pass the second argument (for resolution a tie in ordering) to tapply() or is there any other approaches?
Upvotes: 15
Views: 60392
Reputation: 1
result <- df %<%
arrange(Mortality.Rate) %<%
order_by(State) %<%
summarize(mean(Mortality.Rate)
view(result)
Upvotes: 0
Reputation: 133
This came to my mind
df <- df[with(df, order(State, as.numeric(Mortality.Rate), Hospital.Name)]
Check out this post How to sort a dataframe by column(s)?
Upvotes: 1
Reputation: 10167
In base R, you can supply multiple arguments to order()
and subsequent arguments are used to break ties in the earlier variables, as in:
df[order(df$State,df$Mortality.Rate,df$Hospital.Name),]
Upvotes: 17
Reputation: 92282
If we already in loading needles (for this specific operation) packages, here's a package (data.table
) that could be useful in a sense of sorting the data by reference (without copying it and the need of using <-
) using the setorder
or setkey
functions
library(data.table)
setorder(setDT(df), State, Mortality.Rate, Hospital.Name)
Though, you could potentially mimic base R syntax and order the data while creating a copy (though with improved speed because data.table
calls its forder
under the hood)
setDT(df)[order(State, Mortality.Rate, Hospital.Name)]
Upvotes: 3
Reputation: 6455
You can do this in dplyr. First, some sample data:
library("dplyr")
hospital_name <- sample(c("hospital_1", "hospital_2", "hospital_3"), 10,
replace = TRUE)
state <- sample(letters[1:3], 10, replace = TRUE)
mortality_rate <- runif(10)
df <- data_frame(hospital_name, state, mortality_rate)
Group by state, then arrange by columns.
df %>%
group_by(state) %>%
arrange(mortality_rate, hospital_name)
Producing results like these, where the states are grouped and the mortality rate is sorted within each state.
## Source: local data frame [10 x 3]
## Groups: state
##
## hospital_name state mortality_rate
## 1 hospital_1 b 0.15293591
## 2 hospital_1 b 0.37417167
## 3 hospital_1 b 0.54561856
## 4 hospital_3 c 0.02487033
## 5 hospital_1 c 0.09937557
## 6 hospital_1 c 0.35666087
## 7 hospital_3 c 0.39663460
## 8 hospital_2 c 0.53064144
## 9 hospital_3 c 0.76015632
## 10 hospital_3 c 0.76801890
Without group_by()
you just get the mortality rates from least to greatest:
df %>%
arrange(mortality_rate)
## Source: local data frame [10 x 3]
##
## hospital_name state mortality_rate
## 1 hospital_3 c 0.02487033
## 2 hospital_1 c 0.09937557
## 3 hospital_1 b 0.15293591
## 4 hospital_1 c 0.35666087
## 5 hospital_1 b 0.37417167
## 6 hospital_3 c 0.39663460
## 7 hospital_2 c 0.53064144
## 8 hospital_1 b 0.54561856
## 9 hospital_3 c 0.76015632
## 10 hospital_3 c 0.76801890
Upvotes: 4
Reputation: 14192
you can do it in dplyr
:
df %>% group_by(State) %>% arrange(Mortality.Rate, Hospital.Name)
Upvotes: 12