Reputation: 2685
I've the need of reordering the columns of a huge data.frame based on two conditions. The first is a fixed order based of column prefixes (which I use like a kind of column categories). inside each category, the order is alphabethical.
In other words:
First on a fixed order based on prefixes -> "" {no prefix}, "details", "attributes", "stats", "poll", "endpoint"
Then, inside each subgroup by alphabetical order
Here's a reproducible data.set for playing with:
blah <- data.frame("id"=1,
"details.thumbnail"=1,
"details.image"=1,
"type"=1,
"name"=1,
"attributes.num"=1,
"attributes.boardgamemechanic"=1,
"attributes.boardgameexpansion"=1,
"stats.averageweight.value"=1,
"poll.results.suggested_numplayers.7.Recommended.numvotes"=1,
"poll.results.suggested_numplayers.7.NotRecommended.numvotes"=1,
"attributes.boardgamemechanic"=1,
"endpoint.uri"=1)
I'm really puzzled as every solution I write is really weird and definitively not elegant.
Upvotes: 1
Views: 61
Reputation: 28441
Here's one way:
cols <- c("^([^.]+)$", "^(details)", "^(attributes)", "^(stats)", "^(poll)", "^(endpoint)")
s <- names(blah)
n <- seq_along(cols)
for(i in n) s <- sub(cols[i], paste0(n[i], "\\1"), s)
new_vec <- substr(s[order(s)], 2, nchar(s[order(s)]))
new_vec
# [1] "id"
# [2] "name"
# [3] "type"
# [4] "details.image"
# [5] "details.thumbnail"
# [6] "attributes.boardgameexpansion"
# [7] "attributes.boardgamemechanic"
# [8] "attributes.boardgamemechanic.1"
# [9] "attributes.num"
# [10] "stats.averageweight.value"
# [11] "poll.results.suggested_numplayers.7.NotRecommended.numvotes"
# [12] "poll.results.suggested_numplayers.7.Recommended.numvotes"
# [13] "endpoint.uri"
We use regex to look for the prefixes and non-prefixed column names. The cols
variable is created in the order outlined in the question. Adding a number to each find. When no prefix is found 1
is attached, for details
, 2
is attached. And so on. I use a for
loop as it is best used for operating in one iteration and saving the result for the next. It is easy to order
this new vector of names. Then the number that was attached is taken off for subsetting with blah[,new_vec]
.
Upvotes: 1