ifreak
ifreak

Reputation: 1756

filter text file based on multiple columns

I have a problem which is trivial in the normal programming sense, but i want to find another solution to do this using some python tricks if possible.

I have a text file as follow:

A1 1 5 g1_0 10
A1 6 8 g2_0 13
A1 9 12 g3 18
A1 15 24 g4 2
A1 25 27 g5 5
A1 30 33 g1_1 20
A1 44 67 g10 19
A1 69 77 g9 19
A1 80 99 g6 19
A1 153 166 g2_1 19

what i want to do is to filter the text file in the following way: when i have "_0" or "_1" or "_2" in column 4, i want to check the column 5. if column 5 have the highest number i want to remove all the others with "_" and keep only the one with higher score.

P.S: i want to keep the one with the highest score, for each value of column 4. e.g in this case these values will stay, it's not an all against all comparison, but each pair or triplet that have "_0" "_1" "_2" should be compared alone for the highest score:

A1 9 12 g3 18
A1 15 24 g4 2
A1 25 27 g5 5
A1 30 33 g1_1 20
A1 44 67 g10 19
A1 69 77 g9 19
A1 80 99 g6 19
A1 153 166 g2_1 19

is there anyway to do this without looping a lot ?

Upvotes: 1

Views: 253

Answers (1)

namit
namit

Reputation: 6957

import numpy as np
nk=np.genfromtxt('input.txt', dtype=None)
result1={}
for x in nk:
    result1.setdefault(x[3].split('_')[0],[]).append(x[4])
for x in nk:
    if x[4]==max(result1[x[3].split('_')[0]]):print x

output:

('A1', 9, 12, 'g3', 18)
('A1', 15, 24, 'g4', 2)
('A1', 25, 27, 'g5', 5)
('A1', 30, 33, 'g1_1', 20)
('A1', 44, 67, 'g10', 19)
('A1', 69, 77, 'g9', 19)
('A1', 80, 99, 'g6', 19)
('A1', 153, 166, 'g2_1', 19)

Upvotes: 1

Related Questions