Erika
Erika

Reputation: 69

Compare two text file using python

I'm trying to compare two files, and to extract lines in the first file that correspond to the second file for the first column. For example:

File 1:

VarID GeneID TaxName PfamName
3810359 1327    Isochrysidaceae Methyltransf_21&Methyltransf_22
6557609 5442    Peridiniales    NULL
4723299 7370    Prorocentrum    PEPCK_ATP
3019317 10454   Dinophyceae     NULL
2821675 10965   Bacillariophyta PK;PK_C
5559318 12824   Dinophyceae     Cyt-b5&FA_desaturase

File 2:

VarID
3810359
6557609
4723299
5893435
4852156

For the output I want this file :

VarID GeneID TaxName PfamName
3810359 1327    Isochrysidaceae Methyltransf_21&Methyltransf_22
6557609 5442    Peridiniales    NULL
4723299 7370    Prorocentrum    PEPCK_ATP

I tried this code :

f1 = sys.argv[1]
f2 = sys.argv[2]

file1_rows = []
with open(f1, 'r') as file1:
    for row in file1:
        file1_rows.append(row.split())

# Read data from the second file
file2_rows = []
with open(f2, 'r') as file2:    
    for row in file2:
        file2_rows.append(row.split())

# Compare data and compute results
results = []
for row in file2_rows:
    if row[:1] in file1_rows:
        results.append(row[:4])
    else:
        results.append(row[:4])

# Print the results
for row in results:
    print(' '.join(row))

Can you please help me ??? Thank you !!

Upvotes: 0

Views: 59

Answers (1)

Weiner Nir
Weiner Nir

Reputation: 1475

Your problem is here:

if row[:1] in file1_rows:

row[:1] returns a list with 1 field (the first column in the row). instead, search for that row directly.

this is the new code:

if row[0] in file1_rows:

also, remove the else that is associated to this if (I guess this is mistakly added duo to debugging)

There are few other better practices you can do, I wrote them all here:

f1 = sys.argv[1]
f2 = sys.argv[2]

with open(f1, 'r') as file1:
    file1_rows = file1.read().splitlines()

# Read data from the second file
with open(f2, 'r') as file2:    
    file2_rows = file2.read().splitlines()

# Compare data and compute results
results = []
for row2 in file2_rows:
    for row in file1_rows:
        if row2 in row:
            results.append(row)
            break

print('\n'.join(results))

Upvotes: 2

Related Questions