Prajwal
Prajwal

Reputation: 673

Python regex with newlines doesn't match

I have a file which contains

Line1
Line2
Line3
Line4

and in a Python program I am searching for

Line1
Line2
Line3

The program is

import re

file = open("blah.log","r")
file_contents = file.read()

pattern='''Line1
Line2 
Line3'''

matchObj = re.search(pattern, file_contents, re.M|re.I)
if matchObj:
   print matchObj.group(0)
else:
   print "No match!!"

However, it shows no match even the pattern is in the file.

But if the

file_contents = '''Line1
Line2
Line3
Line4''' # not reading from the file 

Now it matches with regex pattern.

What is the reason for this?

How can I make the program work with the reading the contents from the file?

Upvotes: 0

Views: 310

Answers (2)

PUNITH
PUNITH

Reputation: 11

New line character in a file can be '\n', '\r' or '\r\n'. It depends on OS. To be at safer side, try to match with all new line characters.

pattern='''Line1(\n|\r|\r\n)Line2(\n|\r|\r\n)Line3'''

Upvotes: 1

blhsing
blhsing

Reputation: 107095

Since the lines in your file are delimited by '\r\n', the pattern you search for should account for that.

For convenience, you can still use triple quotes to initialize the string you want to search for, but then use the str.replace() method to replace all occurrences of '\n' with '\r\n':

pattern='''Line1
Line2 
Line3'''.replace('\n', '\r\n')

Furthermore, if all you need is a substring match, you can use the in operator instead of the more costly regex match:

if pattern in file_contents:
   print pattern
else:
   print "No match!!"

Upvotes: 2

Related Questions