Reputation: 359
I am very new to python, and am having some problems I can't seem to find answers to. I have a large file I am trying to read in and then split and write out specific information. I am having trouble with the read in and split, where it is only printing the same thing over and over again.
blast_output = open("blast.txt").read()
for line in blast_output:
subFields = [item.split('|') for item in blast_output.split()]
print(str(subFields[0][0]) + "\t" + str(subFields[0][1]) + "\t" + str(subFields[1][3]) + "\t" + str(subFields[2][0]))
My input file has many rows that look like this:
c0_g1_i1|m.1 gi|74665200|sp|Q9HGP0.1|PVG4_SCHPO 100.00 372 0 0 1 372 1 372 0.0 754
c1002_g1_i1|m.801 gi|1723464|sp|Q10302.1|YD49_SCHPO 100.00 646 0 0 1 646 1 646 0.0 1310
c1003_g1_i1|m.803 gi|74631197|sp|Q6BDR8.1|NSE4_SCHPO 100.00 246 0 0 1 246 1 246 1e-179 502
c1004_g1_i1|m.804 gi|74676184|sp|O94325.1|PEX5_SCHPO 100.00 598 0 0 1 598 1 598 0.0 1227
The output I am receiving is this:
c0_g1_i1 m.1 Q9HGP0.1 100.00
c0_g1_i1 m.1 Q9HGP0.1 100.00
c0_g1_i1 m.1 Q9HGP0.1 100.00
c0_g1_i1 m.1 Q9HGP0.1 100.00
But what I am wanting is
c0_g1_i1 m.1 Q9HGP0.1 100.0
c1002_g1_i1 m.801 Q10302.1 100.0
c1003_g1_i1 m.803 Q6BDR8.1 100.0
c1004_g1_i1 m.804 O94325.1 100.0
Upvotes: 0
Views: 60
Reputation: 78546
You don't need to call the read
method of the file object, just iterate over it, line by line. Then replace blast_output
with line
in the for loop to avoid repeating the same action across all the iterations:
with open("blast.txt") as blast_output:
for line in blast_output:
subFields = [item.split('|') for item in line.split()]
print("{:15}{:10}{:10}{:10}".format(subFields[0][0], subFields[0][1],
subFields[0][1], subFields[1][3], subFields[2][0]))
I have opened the file in a context using with
, so closing is automatically done by Python. I have also used string formatting to build the final string.
c0_g1_i1 m.1 m.1 Q9HGP0.1
c1002_g1_i1 m.801 m.801 Q10302.1
c1003_g1_i1 m.803 m.803 Q6BDR8.1
c1004_g1_i1 m.804 m.804 O94325.1
Upvotes: 1
Reputation: 7806
Great question. You are taking the same input over and over again with this line
subFields = [item.split('|') for item in blast_output.split()]
The python 2.x version looks like this:
blast_output = open("blast.txt").read()
for line in blast_output:
subFields = [item.split('|') for item in line.split()]
print(str(subFields[0][0]) + "\t" + str(subFields[0][1]) + "\t" + str(subFields[1][3]) + "\t" + str(subFields[2][0]))
see Moses Koledoye's version for the Python 3.x formatted niceness
Upvotes: 1