Reordering column names

Question

I have a similar problem in two scenarios.

Scenario 1: dataframe with identical column names with two groups with no particular order. ALL|ALL|AML|ALL|AML|AML|AML|ALL

Scenario 2: dataframe column names with numeric suffixes. ALL, ALL.1, ALL.2, AML.1, AML.2, ...this has double digit numbers too. If I order this in ascending order, it becomes ALL.1, ALL.10, ALL.11

I wish to group all ALLs first and the followed by AMLs. How can I achieve this in both scenarios?

Sotos · Accepted Answer

One way to approach this,

y <- c('ALL', 'ALL.1', 'ALL.2', 'AML.1', 'AML.2', 'ALL.10')

y[order(gsub('\.\d+', '', y))]
#[1] "ALL"    "ALL.1"  "ALL.2"  "ALL.10" "AML.1"  "AML.2" 

#or to use it in a data frame,
df[,order(gsub('\.\d+', '', names(df))))]

Additionally you can use mixedorder from gtools package but you will have to replace the . from the suffix so it won't treat it as decimal (meaning .10 < .2 and not 10 > 2), i.e.

library(gtools)

#with the . in suffix
mixedsort(y)
#[1] "ALL.1"  "ALL.10" "ALL.2"  "ALL"    "AML.1"  "AML.2" 

#without the . in suffix
mixedsort(gsub('\.', '_', y))
#[1] "ALL"    "ALL_1"  "ALL_2"  "ALL_10" "AML_1"  "AML_2" 

#or use it on the data frame
df[,mixedorder(gsub('\.', '_', names (df)]

As for your first case, I agree with @alistaire that names NEED to be unique. Use make.unique and follow the method above

Reordering column names

Answers (1)

Related Questions