Cryptoharf84
Cryptoharf84

Reputation: 379

sort column by variable name then loop in each variable

I am an R noob :) and this is my first post. I have a dataset of 4k entries (data) describing mortality rates (data$mortality) by US state (data$state).

I want to loop through the mortality rates by state name for instance loop through all mortality rates in "AK" something like this:

tbl <- table (data$State) ## table with frequency for entries at each state 

How can I loop through all the occurrences of each state?

I don't want to specify the state name. I want to sort all states then loop through them by name: "AK", "AL" etc...

for instance, my table would be:

State   mortality 
AL  14.3
AL  18.5
AL  18.1
AL  NA
AL  NA
AK  NA
AK  17.7
AK  18
AK  15.9
AK  NA
AK  19.6
AK  17.3
AZ  15
AZ  17.1
AZ  17.1
AZ  NA
AZ  16.4
AZ  15.2
AZ  16.7

I can then loop through all rates in "AL" and rank them then choose a hospital name associated with each ranked mortality rate in "AL" I can write a piece of code for each state at a time but imagine doing that for all states!

Upvotes: 2

Views: 714

Answers (2)

Frank
Frank

Reputation: 66819

Here's a data.table solution, as suggested in a comment:

require(data.table)
DT <- data.table(hospID=1:nrow(data),data)
DT[,r:=rank(mortality,na.last='keep'),by=State]

Then run DT to see the result:

    hospID State mortality   r
 1:      1    AL      14.3 1.0
 2:      2    AL      18.5 3.0
 3:      3    AL      18.1 2.0
 4:      4    AL        NA  NA
 5:      5    AL        NA  NA
 6:      6    AK        NA  NA
 7:      7    AK      17.7 3.0
 8:      8    AK      18.0 4.0
 9:      9    AK      15.9 1.0
10:     10    AK        NA  NA
11:     11    AK      19.6 5.0
12:     12    AK      17.3 2.0
13:     13    AZ      15.0 1.0
14:     14    AZ      17.1 5.5
15:     15    AZ      17.1 5.5
16:     16    AZ        NA  NA
17:     17    AZ      16.4 3.0
18:     18    AZ      15.2 2.0

Look at ?rank to see different ways of handling ties and NA values.

If you want to sort on the rank, you can do that with DT[order(State,r)]. The data.table package also allows for a key -- a vector of columns on which the data.table is sorted automatically. There are other benefits to setting a key as well that you can read about in a data.table tutorial or the FAQ.

Upvotes: 2

Fernando
Fernando

Reputation: 7905

To sort by col 'a':

x = data.frame(a = sample(LETTERS, 10), b = runif(10))
x = x[order(x[, 'a']), ]
print(x)

4  B 0.8030872
9  C 0.3754850
7  D 0.8670409
5  G 0.1278583
3  J 0.9161972
6  N 0.7159080
8  R 0.5340525
2  S 0.2903496
10 T 0.5466612
1  V 0.9187505

Upvotes: 0

Related Questions