Reputation: 1151
I want to sort my tab-delimited data file containing 15 columns according to column[0], i.e. my input file (I illustrate only column 0)
Input file and desired Output file
contig1 contig1
contig102 contig1
contig405 contig2
contig1 contig17
contig2 contig102
contig1005 contig405
contig17 contig1005
The script below sorts, but since 1 < 2, it gives me all contigs having 1 then passes to 2, also since 0 < 1, is gives me 102 before 2, how to improve it?
f1 = open('file.txt','r')
a=sorted(f1.readlines(), key=lambda l: l.split()[0]))
r=open('file.txt','w')
r.writelines(a)
f1.close
Upvotes: 0
Views: 69
Reputation: 9709
How about this one:
import re
def alphanumsort(x):
reg = re.compile('(\d+)')
splitted = reg.split(x)
return [int(y) if y.isdigit() else y for y in splitted]
print sorted(["contig1","contig20","bart30","bart03"], key = alphanumsort)
Upvotes: 2
Reputation:
If
l.split()[0]
gives
contig1
contig102
You want to sort on
int(l.split()[0][6:])
which is
1
102
Do
a = sorted(f1, key=lambda l: int(l.split()[0][6:]))
Upvotes: 1