user2489612
user2489612

Reputation: 73

Write out the matched lines in python

I am trying to print out the line of the matched pattern and write out the matched lines.

The number of the matched line works fine, however, neither did Python write the content in the new file, nor did it raise an error message.

#!/usr/bin/env python
import re
outputLineNumbers = open('OutputLineNumbers', 'w')
outputLine = open('OutputLine', 'w')
inputFile = open('z.vcf','r')
matchLines = inputFile.readlines()    


total = 0
for i in range(len(matchLines)):
    line = matchLines[i]
#print out the matched line number    
    if re.match('(\w+)\|(\d+)\|(\w+)\|AGTA(\d+)\.(\d)\|\s(0+\d+)\s\.\s(\w)\s(\w),(\w)', line):
        total += 1
        outputLineNumbers.write( str(i+1) + "\n" )
#WRITE out the matched line
    if line == ('(\w+)\|(\d+)\|(\w+)\|AGTA(\d+)\.(\d)\|\s(0+\d+)\s\.\s(\w)\s(\w),(\w)'):
        outputLine.write( line + "\n" )
print "total polyploid marker is : ", total


outputLineNumbers.close()
inputFile.close()
outputLine.close()

Upvotes: 0

Views: 114

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121466

You tried to test if the line is equal to the pattern:

if line == ('(\w+)\|(\d+)\|(\w+)\|AGTA(\d+)\.(\d)\|\s(0+\d+)\s\.\s(\w)\s(\w),(\w)'):

String equality does not magically invoke the regular expression engine when the string appears to contain a pattern, however.

Remove the if line == test and just write out the matched line as part of the preceding if block:

if re.match('(\w+)\|(\d+)\|(\w+)\|AGTA(\d+)\.(\d)\|\s(0+\d+)\s\.\s(\w)\s(\w),(\w)', line):
    total += 1
    outputLineNumbers.write( str(i+1) + "\n" )
    #WRITE out the matched line
    outputLine.write( line + "\n" )

Note that you can just loop over matchLines directly; use the enumerate() function to produce a running index here instead:

for i, line in enumerate(matchLines, 1):
    if re.match('(\w+)\|(\d+)\|(\w+)\|AGTA(\d+)\.(\d)\|\s(0+\d+)\s\.\s(\w)\s(\w),(\w)', line):
        total += 1
        outputLineNumbers.write("{}\n".format(i))

where i starts at 1, so there is no need to add 1 later on either.

Upvotes: 2

Related Questions