Reputation: 1151
If my tab-delimited file is:
a b 77.8
a d 77.8
e f 56.7
e r 40.0
I want to print an elem in line[0] with max value in line[2], but when the value is the same, to print both, how to modify my code below for this?
import csv
from itertools import groupby
from operator import itemgetter
with open('input.txt,'rb') as f1:
with open('out.txt','wb') as f2:
reader = csv.reader(f1, delimiter='\t')
writer1 = csv.writer(f2, delimiter='\t')
for group, rows in groupby(filter(lambda x: x[0]!=x[1], reader), key=itemgetter(0)):
best = max(rows, key=lambda r: (float(r[2])))
writer1.writerow(best)
So, my output should be like this:
a b 77.8
a d 77.8
e f 56.7
Upvotes: 0
Views: 102
Reputation: 1967
An alternative which uses pandas
, (where the reading and writing to files is nicer):
import pandas as pd
df = pd.read_table('eg.txt', header=None, sep=' ')
with open('output.txt', 'wb') as f:
for c in set(df[0]):
d = df[df[0] == c].sort_values(by=[2], ascending=False)
d = d[d[2] == d[2].iloc[0]]
d.to_csv(f, index=False, sep='\t', header=False)
which gives output:
a b 77.8
a d 77.8
e f 56.7
Upvotes: 1
Reputation: 17263
Instead of writing the max item from rows
you could sort the row in decreasing order by third value, group it by third value and write the items in first group:
import csv
from itertools import groupby
from operator import itemgetter
with open('input.txt','rb') as f_in, open('out.txt','wb') as f_out:
reader = csv.reader(f_in, delimiter='\t')
writer1 = csv.writer(f_out, delimiter='\t')
for group, rows in groupby(filter(lambda x: x[0]!=x[1], reader), key=itemgetter(0)):
rows = sorted(rows, key=lambda r: (float(r[2])), reverse=True)
_, best = next(groupby(rows, key=itemgetter(2)))
writer1.writerows(best)
Output in out.txt
:
a b 77.8
a d 77.8
e f 56.7
Upvotes: 1