amisos55
amisos55

Reputation: 1979

ordering alpha numeric variable in r

I would like to order a data frame based on an alphanumeric variable. Here how my dataset looks like:

sample.data <- data.frame(Grade=c(4,4,4,4,3,3,3,3,3,3,3,3),
                          ItemID = c(15,15,15,15,17,17,17,17,16,16,16,16),
                          common.names = c("15_AS_SA1_Correct","15_AS_SA10_Correct","15_AS_SA2_Correct","15_AS_SA3_Correct",
                                            "17_AS_2_B2","17_AS_2_B1","17_AS_5_C1","17_AS_4_D1",
                                           "16_AS_SA1_Negative","16_AS_SA11_Prediction","16_AS_SA12_UnitMeaning","16_AS_SA3_Complete"))

> sample.data
   Grade ItemID           common.names
1      4     15      15_AS_SA1_Correct
2      4     15     15_AS_SA10_Correct
3      4     15      15_AS_SA2_Correct
4      4     15      15_AS_SA3_Correct
5      3     17             17_AS_2_B2
6      3     17             17_AS_2_B1
7      3     17             17_AS_5_C1
8      3     17             17_AS_4_D1
9      3     16     16_AS_SA1_Negative
10     3     16  16_AS_SA11_Prediction
11     3     16 16_AS_SA12_UnitMeaning
12     3     16     16_AS_SA3_Complete

I need to order by Grade and ItemID, then by common.names variable that contains alphanumeric.

I used this:

sample.data.ordered <- sample.data %>%
  arrange(Grade, ItemID,common.names)

but it did not work for the whole set.

My desired output is:

> sample.data.ordered
   Grade ItemID           common.names
1      3     16     16_AS_SA1_Negative
2      3     16     16_AS_SA3_Complete
3      3     16  16_AS_SA11_Prediction
4      3     16 16_AS_SA12_UnitMeaning
5      3     17             17_AS_2_B1
6      3     17             17_AS_2_B2
7      3     17             17_AS_4_D1
8      3     17             17_AS_5_C1
9      4     15      15_AS_SA1_Correct
10     4     15      15_AS_SA2_Correct
11     4     15      15_AS_SA3_Correct
12     4     15     15_AS_SA10_Correct

Any thoughts? Thanks!

Upvotes: 2

Views: 444

Answers (1)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

A base R solution using order as well as a more complex procedure for common.names involving gsub, regular expression and multiple backreference to match the numbers in the strings by which the column can be ordered:

sample.data[order(sample.data$Grade, 
              sample.data$ItemID, 
              as.numeric(gsub(".*(SA|AS_)(\\d+)_(\\w)?(\\d)?.*", "\\2\\4", sample.data$common.names))),]
   Grade ItemID           common.names
9      3     16     16_AS_SA1_Negative
12     3     16     16_AS_SA3_Complete
10     3     16  16_AS_SA11_Prediction
11     3     16 16_AS_SA12_UnitMeaning
6      3     17             17_AS_2_B1
5      3     17             17_AS_2_B2
8      3     17             17_AS_4_D1
7      3     17             17_AS_5_C1
1      4     15      15_AS_SA1_Correct
3      4     15      15_AS_SA2_Correct
4      4     15      15_AS_SA3_Correct
2      4     15     15_AS_SA10_Correct

Upvotes: 1

Related Questions