Gabriele B
Gabriele B

Reputation: 2685

Reordering data.frame columns first by a fixed-order of subgroups, then alphabetically inside each subgroup

I've the need of reordering the columns of a huge data.frame based on two conditions. The first is a fixed order based of column prefixes (which I use like a kind of column categories). inside each category, the order is alphabethical.

In other words:

Here's a reproducible data.set for playing with:

blah <- data.frame("id"=1,                                                         
                   "details.thumbnail"=1,                                          
                   "details.image"=1,
                   "type"=1,                                                       
                   "name"=1,                                               
                   "attributes.num"=1,                                             
                   "attributes.boardgamemechanic"=1,                               
                   "attributes.boardgameexpansion"=1,                              
                   "stats.averageweight.value"=1,                                  
                   "poll.results.suggested_numplayers.7.Recommended.numvotes"=1,
                   "poll.results.suggested_numplayers.7.NotRecommended.numvotes"=1,
                   "attributes.boardgamemechanic"=1,   
                   "endpoint.uri"=1) 

I'm really puzzled as every solution I write is really weird and definitively not elegant.

Upvotes: 1

Views: 61

Answers (1)

Pierre L
Pierre L

Reputation: 28441

Here's one way:

cols <- c("^([^.]+)$", "^(details)", "^(attributes)", "^(stats)", "^(poll)", "^(endpoint)")
s <- names(blah)
n <- seq_along(cols)
for(i in n) s <- sub(cols[i], paste0(n[i], "\\1"), s)
new_vec <- substr(s[order(s)], 2, nchar(s[order(s)]))
new_vec
#  [1] "id"                                                         
#  [2] "name"                                                       
#  [3] "type"                                                       
#  [4] "details.image"                                              
#  [5] "details.thumbnail"                                          
#  [6] "attributes.boardgameexpansion"                              
#  [7] "attributes.boardgamemechanic"                               
#  [8] "attributes.boardgamemechanic.1"                             
#  [9] "attributes.num"                                             
# [10] "stats.averageweight.value"                                  
# [11] "poll.results.suggested_numplayers.7.NotRecommended.numvotes"
# [12] "poll.results.suggested_numplayers.7.Recommended.numvotes"   
# [13] "endpoint.uri" 

We use regex to look for the prefixes and non-prefixed column names. The cols variable is created in the order outlined in the question. Adding a number to each find. When no prefix is found 1 is attached, for details, 2 is attached. And so on. I use a for loop as it is best used for operating in one iteration and saving the result for the next. It is easy to order this new vector of names. Then the number that was attached is taken off for subsetting with blah[,new_vec].

Upvotes: 1

Related Questions