pogibas
pogibas

Reputation: 28339

Sort files/objects with numbers and letters (alphanumeric) names

My files are:

CT.BP.50.txt
CT.BP.200.txt
CT.BP.500.txt 
GP.BP.50.txt
GP.BP.200.txt 
GP.BP.500.txt 

files <- c("CT.BP.50.txt", "CT.BP.200.txt", "CT.BP.500.txt", "GP.BP.50.txt", "GP.BP.200.txt", "GP.BP.500.txt")

I want to perform specific operation on them, I can do this:

for (i in 1:length(files)) {
    foo <- read.table(files[i])
    barplot(table(foo$V1), main = files[i])
}

But R plots them in this order:

"CT.BP.200.txt" "CT.BP.500.txt" "CT.BP.50.txt" "GP.BP.200.txt" "GP.BP.500.txt" "GP.BP.50.txt"

And I want them to be plotted in sorted order:

"CT.BP.50.txt" "CT.BP.200.txt" "CT.BP.500.txt" "GP.BP.50.txt" "GP.BP.200.txt" "GP.BP.500.txt"

How sort objects with alphanumeric names?

Upvotes: 4

Views: 2703

Answers (3)

Seth
Seth

Reputation: 4795

It looks like you want to sort by particular components of your filename in a particular order.

So I would start by breaking the filename into its components with something like:

filesmat=matrix(unlist(strsplit(files,split='\\.')),byrow=T,ncol=4)

then extract columns that you want to sort by.

numbercomponent=as.numeric(filesmat[,3])

varname=filesmat[,1]

Then reorder the filenames with something like

files=files[order(varname,numbercomponent)]

Then just plot anyway you want.

Upvotes: 1

Brian Diggs
Brian Diggs

Reputation: 58835

The problem is that list.files() returns the file names in standard (lexically) sorted order, and the digits are being compared position by position rather than as part of a number.

files <- sort(c("Gen.Var_CT.BP.200.txt", "Gen.Var_CT.BP.500.txt", 
                "Gen.Var_CT.BP.50.txt", "Gen.Var_GP.BP.200.txt",
                "Gen.Var_GP.BP.500.txt", "Gen.Var_GP.BP.50.txt"))

On my system, this gives:

> files
[1] "Gen.Var_CT.BP.200.txt" "Gen.Var_CT.BP.50.txt"  "Gen.Var_CT.BP.500.txt"
[4] "Gen.Var_GP.BP.200.txt" "Gen.Var_GP.BP.50.txt"  "Gen.Var_GP.BP.500.txt"

The function gtools::mixedsort will (in general) sort the way you want: series of digits in a string will be treated as numbers for sorting purposes. There is a bit of a snag with your example, though, because mixedsort assumes . are part of numbers and so sees .200. as a potential number, which can't actually be sorted as a number. Since your examples don't have actual decimal points within them, you can get around this.

files <- files[mixedorder(gsub("\\.", " ", files))]

So files is now sorted as:

> files
[1] "Gen.Var_CT.BP.50.txt"  "Gen.Var_CT.BP.200.txt" "Gen.Var_CT.BP.500.txt"
[4] "Gen.Var_GP.BP.50.txt"  "Gen.Var_GP.BP.200.txt" "Gen.Var_GP.BP.500.txt"

Upvotes: 11

Jeff Allen
Jeff Allen

Reputation: 17517

Might this do it?

files <- c("Gen.Var_CT.BP.50.txt", "Gen.Var_CT.BP.200.txt", "Gen.Var_CT.BP.500.txt",    "Gen.Var_GP.BP.50.txt", "Gen.Var_GP.BP.200.txt", "Gen.Var_GP.BP.500.txt"){
for (i in 1:length(files)) {
  b <- read.table(files[i])
  barplot(table(b$V1), main=files[i])

Upvotes: 2

Related Questions