Python - parse a line of text

Question

I have the following input from a text file:

Title Value Position Perturbation 1.5 0.6 8.5 9.8 0 8.5 9.6 0.5 0.6 (...)

Title Value Position Perturbation 3 1.5 6 0 0.8 9.7 5.3 9.9 0.7 0.9 (...)

I want to remove the first 4 columns and for the columns with numbers I want to subset every 4 values and changing the position of the 3rd value for the second one and remove the fourth one, so, the output should look like:

1.5 8.5 0.6 0 9.6 8.5 0.6 (...)
3 6 1.5 0.8 5.3 9.7 0.7 (...)

For this propose I write the following Python code:

import sys

input_file= open (sys.argv[1],'r')
output_file= open (sys.argv[2], 'w')
with open(sys.argv[1]) as input_file:
for i, line in enumerate(input_file):
        output_file.write ('
')
        marker_info= line.split()
        #snp= marker_info[0]
        end= len(marker_info)   
        x=4
        y=8
        # while y<=len(marker_info):
        while x<=end:
            intensities= marker_info[x:y]
            AA= intensities[0]
            BB= intensities[1]
            AB= intensities[2]
            NN= intensities[3]
            output_file.write ('%s' '	' '%s' '	' '%s' '	' % (AA, AB, BB))
            x= y 
            y= x + 4
input_file.close()
output_file.close()

The code seems to work fine but the problem is that for each line, the last four values are missing. So, I guess that the problem is in the "while" statement...But I have not clue how to solve it (I know that it seems a simple problem).

Thanks in advance for any suggestions.

Yanuar Kusnadi · Accepted Answer

Try this one, its all based on your script, except at while expression and open file method. Input File :

Title Value Position Perturbation 1.5 0.6 8.5 9.8 0 8.5 9.6 0.5 0.6 1.1 2.2 3.3
Title Value Position Perturbation 3 1.5 6 0 0.8 9.7 5.3 9.9 0.7 0.9 1.1 2.2
Title Value Position Perturbation 3.1 2.5 1.6 0 1.8 2.7 4.3 6.9 3.7 1.9 2.1 3.2

Script :

with open("parser.txt", "r") as input_file, open("output_parser.txt","w") as output_file:
    for i, line in enumerate(input_file):
        output_file.write ('
')
        marker_info= line.split()
        end= len(marker_info)
        x=4
        y=8

        while y<=end: #x<=end:
            intensities= marker_info[x:y]
            AA= intensities[0]
            BB= intensities[1]
            AB= intensities[2]
            NN= intensities[3]
            output_file.write ('%s' '	' '%s' '	' '%s' '	' % (AA, AB, BB))
            print end, x, y, marker_info[x:y], AA, AB, BB

            x= y 
            y= x + 4

Output :

1.5 8.5 0.6 0   9.6 8.5 0.6 2.2 1.1 
3   6   1.5 0.8 5.3 9.7 0.7 1.1 0.9 
3.1 1.6 2.5 1.8 4.3 2.7 3.7 2.1 1.9

Python - parse a line of text

Answers (2)

Related Questions