Reputation: 1756
I have a problem which is trivial in the normal programming sense, but i want to find another solution to do this using some python tricks if possible.
I have a text file as follow:
A1 1 5 g1_0 10
A1 6 8 g2_0 13
A1 9 12 g3 18
A1 15 24 g4 2
A1 25 27 g5 5
A1 30 33 g1_1 20
A1 44 67 g10 19
A1 69 77 g9 19
A1 80 99 g6 19
A1 153 166 g2_1 19
what i want to do is to filter the text file in the following way: when i have "_0" or "_1" or "_2" in column 4, i want to check the column 5. if column 5 have the highest number i want to remove all the others with "_" and keep only the one with higher score.
P.S: i want to keep the one with the highest score, for each value of column 4. e.g in this case these values will stay, it's not an all against all comparison, but each pair or triplet that have "_0" "_1" "_2" should be compared alone for the highest score:
A1 9 12 g3 18
A1 15 24 g4 2
A1 25 27 g5 5
A1 30 33 g1_1 20
A1 44 67 g10 19
A1 69 77 g9 19
A1 80 99 g6 19
A1 153 166 g2_1 19
is there anyway to do this without looping a lot ?
Upvotes: 1
Views: 253
Reputation: 6957
import numpy as np
nk=np.genfromtxt('input.txt', dtype=None)
result1={}
for x in nk:
result1.setdefault(x[3].split('_')[0],[]).append(x[4])
for x in nk:
if x[4]==max(result1[x[3].split('_')[0]]):print x
output:
('A1', 9, 12, 'g3', 18)
('A1', 15, 24, 'g4', 2)
('A1', 25, 27, 'g5', 5)
('A1', 30, 33, 'g1_1', 20)
('A1', 44, 67, 'g10', 19)
('A1', 69, 77, 'g9', 19)
('A1', 80, 99, 'g6', 19)
('A1', 153, 166, 'g2_1', 19)
Upvotes: 1