Reputation: 8372
I'm new to python and have problems catching a multiline text block on a raspberry pi running python.
I'm trying to catch the multiline text between HELLO and WORLD.
This example throws the error AttributeError: 'NoneType' object has no attribute 'group'
linestring = """
TEST TEST HELLO
TEST TEST PIZZA
TEST TEST WORLD
TEST TEST
"""
print(linestring)
m = re.search('HELLO(.*)WORLD', linestring)
print(m.group(1))
Upvotes: 1
Views: 77
Reputation: 1311
Jerry beat me to it. The re.DOTALL would make your regex do what you think it should do (match across lines).
If you're new in general to playing around with regex (or just regex in Python), I'd suggest using an interactive website like http://www.pythonregex.com/
If you plugged in your regex and search string without DOTALL:
>>> regex = re.compile("HELLO(.*)WORLD")
>>> r = regex.search(string)
# No match was found:
>>> r
# Run findall
>>> regex.findall(string)
[]
On the other hand, if you checked the DOTALL option, you'd see:
>>> regex = re.compile("HELLO(.*)WORLD",re.DOTALL)
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x60c32711bbd32580>
>>> regex.match(string)
None
# List the groups found
>>> r.groups()
(u'\nTEST TEST PIZZA\nTEST TEST ',)
# List the named dictionary objects found
>>> r.groupdict()
{}
# Run findall
>>> regex.findall(string)
[u'\nTEST TEST PIZZA\nTEST TEST ']
It's helpful for also understanding what the different re
functions give you, which isn't necessarily intuitive when you first start using them.
Upvotes: 1
Reputation: 25974
You need to set re.DOTALL
flag in order for .
to match newline characters.
re.search('HELLO(.*)WORLD', linestring) is None
Out[13]: True
re.search('HELLO(.*)WORLD', linestring, re.DOTALL) is None
Out[14]: False
Upvotes: 1
Reputation: 71578
You can use:
m = re.search(r'HELLO([\s\S]*?)WORLD', linestring)
That's because .
by default doesn't match newlines. Or use the re.DOTALL flag:
m = re.search(r'HELLO(.*?)WORLD', linestring, re.DOTALL)
Which causes .
to match newlines.
Note, I used a lazy quantifier instead just so if there's:
linestring = """
TEST TEST HELLO
TEST TEST PIZZA
TEST TEST WORLD
TEST TEST WORLD
"""
The result would be for the whole match:
HELLO
TEST TEST PIZZA
TEST TEST WORLD
Instead of:
HELLO
TEST TEST PIZZA
TEST TEST WORLD
TEST TEST WORLD
You can also use (?s)
for the dotall flag by the way:
m = re.search(r'(?s)HELLO(.*?)WORLD', linestring)
Upvotes: 4