user3224522
user3224522

Reputation: 1151

How to sort by two lines and print the line with the highest value

I have an input file,

1A  Traes_1AS_6052071D9.1   99.01   101 99.0    
1A  Traes_1DS_6BA87D1DA.1   96.04   101 99.0    
1A  Traes_1BS_480915AD0.1   94.06   101 99.0    
1B  Traes_1AS_49D585BA6.2   99.01   101 72.0    
1B  Traes_1BS_47F027BBE.2   98.02   101 89.0    
1B  Traes_1DS_3F816B920.1   97.03   101 92.0    
1C  Traes_1AS_3451447E0.1   99.01   101 97.0
1C  Traes_1BS_9F243CEA6.2   92.93   99  97.0    
1C  Traes_1DS_2A6443F45.1   89.90   99  97.0    

I need to

  1. group and iterate inside each line[0],
  2. sort line[4] from lowest to highest value and take the highest value
  3. if they are similar, print the results choosing the one that has highest value in line[2], so that my output file looks like this:

Required output:

1A  Traes_1AS_6052071D9.1   99.01   101 99.0    
1B  Traes_1DS_3F816B920.1   97.03   101 92.0    
1C  Traes_1AS_3451447E0.1   99.01   101 97.0    

This is my try, but it only takes according to the highest line[4]:

import csv
from itertools import groupby
from operator import itemgetter
with open('my_file','rb') as f1:
with open('out_file', 'wb') as f2:
    reader = csv.reader(f1, delimiter='\t')
    writer1 = csv.writer(f2, delimiter='\t')
    for group, rows in groupby(reader, itemgetter(0)):
        seen = set()
        rows = sorted(rows, key=lambda r: float(r[4]))
        for row in rows:
            max(rows, key=lambda r: float(r[4]))
            writer1.writerow(row)

Upvotes: 2

Views: 190

Answers (1)

tobias_k
tobias_k

Reputation: 82899

Just have the key function for max return a tuple of (r[4], r[2])

Slightly simplified example (without output file)

with open('data.txt','rb') as f1:
    reader = csv.reader(f1, delimiter='\t')
    for group, rows in groupby(reader, itemgetter(0)):
        best = max(rows, key=lambda r: (float(r[4]), float(r[2])))
        print best

Upvotes: 3

Related Questions