user3224522
user3224522

Reputation: 1151

Sorting names containing numbers

I want to sort my tab-delimited data file containing 15 columns according to column[0], i.e. my input file (I illustrate only column 0)

Input file and desired Output file

contig1               contig1
contig102             contig1
contig405             contig2
contig1               contig17
contig2               contig102
contig1005            contig405
contig17              contig1005

The script below sorts, but since 1 < 2, it gives me all contigs having 1 then passes to 2, also since 0 < 1, is gives me 102 before 2, how to improve it?

f1 = open('file.txt','r')
a=sorted(f1.readlines(), key=lambda l: l.split()[0]))
r=open('file.txt','w')
r.writelines(a)
f1.close

Upvotes: 0

Views: 69

Answers (2)

dorvak
dorvak

Reputation: 9709

How about this one:

import re

def alphanumsort(x):
    reg = re.compile('(\d+)')
    splitted = reg.split(x)
    return [int(y) if y.isdigit() else y for y in splitted]

print sorted(["contig1","contig20","bart30","bart03"], key = alphanumsort)

Upvotes: 2

user1907906
user1907906

Reputation:

If

l.split()[0]

gives

contig1
contig102

You want to sort on

int(l.split()[0][6:])

which is

1
102

Do

a = sorted(f1, key=lambda l: int(l.split()[0][6:]))

Upvotes: 1

Related Questions