Reputation: 25
I have a similar problem in two scenarios.
Scenario 1: dataframe with identical column names with two groups with no particular order. ALL|ALL|AML|ALL|AML|AML|AML|ALL
Scenario 2: dataframe column names with numeric suffixes. ALL, ALL.1, ALL.2, AML.1, AML.2, ...this has double digit numbers too. If I order this in ascending order, it becomes ALL.1, ALL.10, ALL.11
I wish to group all ALLs first and the followed by AMLs. How can I achieve this in both scenarios?
Upvotes: 0
Views: 701
Reputation: 51592
One way to approach this,
y <- c('ALL', 'ALL.1', 'ALL.2', 'AML.1', 'AML.2', 'ALL.10')
y[order(gsub('\\.\\d+', '', y))]
#[1] "ALL" "ALL.1" "ALL.2" "ALL.10" "AML.1" "AML.2"
#or to use it in a data frame,
df[,order(gsub('\\.\\d+', '', names(df))))]
Additionally you can use mixedorder
from gtools
package but you will have to replace the .
from the suffix so it won't treat it as decimal (meaning .10 < .2 and not 10 > 2), i.e.
library(gtools)
#with the . in suffix
mixedsort(y)
#[1] "ALL.1" "ALL.10" "ALL.2" "ALL" "AML.1" "AML.2"
#without the . in suffix
mixedsort(gsub('\\.', '_', y))
#[1] "ALL" "ALL_1" "ALL_2" "ALL_10" "AML_1" "AML_2"
#or use it on the data frame
df[,mixedorder(gsub('\\.', '_', names (df)]
As for your first case, I agree with @alistaire that names NEED to be unique. Use make.unique
and follow the method above
Upvotes: 2