user1830307
user1830307

Reputation:

Ordering categorical variables in R

I have a dataset with three rows:

Date         State     Count
1994-01-05   Alabama   408
1994-01-06   Alabama   784
1994-02-08   Alabama   552
1994-01-05   Alaska    1067
1994-01-06   Alaska    36
1994-02-08   Alaska    8571
1994-01-05   Arizona   385
1994-01-06   Arizona   1845
1994-02-08   Arizona   49

where there are counts for the same set of dates for each of the fifty states. The dates and states are ordered as shown above.

I want to get the date into a format with four rows*:

Date         State     Count   mean
1994-01-05   Alabama   408     581.333
1994-01-06   Alabama   784     581.333
1994-02-08   Alabama   552     581.333
1994-01-05   Arizona   385     759.666
1994-01-06   Arizona   1845    759.666
1994-02-08   Arizona   49      759.666
1994-01-05   Alaska    1067    3224.666
1994-01-06   Alaska    36      3224.666
1994-02-08   Alaska    8571    3224.666

where, first, the mean of the counts for each state is computed and inputted into the fourth column. And then, the states are reordered from smallest to largest mean.

I was able to complete the first step of computing the mean for each state, using the command:

plyed = ddply(dataset,.(State), transform, mean= mean(Count))

However, this command only computed the mean for each state, but did not reorder the states by the mean value, giving the below:

Date         State     Count   mean
1994-01-05   Alabama   408     581.333
1994-01-06   Alabama   784     581.333
1994-02-08   Alabama   552     581.333
1994-01-05   Alaska    1067    3224.666
1994-01-06   Alaska    36      3224.666
1994-02-08   Alaska    8571    3224.666
1994-01-05   Arizona   385     759.666
1994-01-06   Arizona   1845    759.666
1994-02-08   Arizona   49      759.666

I am unsure how to now reorder the states by their mean to get my desired output*. I tried the reorder command, but am getting all different and unwanted output formats. Here is one example of a command I tried with no success:

reorder(plyed$State, plyed$mean, order=is.ordered(plyed$State)) 

Upvotes: 1

Views: 2605

Answers (2)

weitzner
weitzner

Reputation: 440

Try using the order() function. A good example can be found in the answer to this question How to sort a dataframe by column(s)?

new_df <- plyed[with(plyed, order(mean)),]

Upvotes: 1

dickoa
dickoa

Reputation: 18437

You can use plyr::arrange

arrange(ddply(df, .(State), mutate, mean = mean(Count)), mean)
##         Date   State Count    mean
## 1 1994-01-05 Alabama   408  581.33
## 2 1994-01-06 Alabama   784  581.33
## 3 1994-02-08 Alabama   552  581.33
## 4 1994-01-05 Arizona   385  759.67
## 5 1994-01-06 Arizona  1845  759.67
## 6 1994-02-08 Arizona    49  759.67
## 7 1994-01-05  Alaska  1067 3224.67
## 8 1994-01-06  Alaska    36 3224.67
## 9 1994-02-08  Alaska  8571 3224.67

Just for fun I'll add the dplyr solution

detach(package:plyr)
library(dplyr)
df %.%
    group_by(State) %.%
    mutate(mean = mean(Count)) %.%
    arrange(mean)

Upvotes: 0

Related Questions