576i
576i

Reputation: 8372

Trying to catch one large multiline block in python

I'm new to python and have problems catching a multiline text block on a raspberry pi running python.

I'm trying to catch the multiline text between HELLO and WORLD.

This example throws the error AttributeError: 'NoneType' object has no attribute 'group'

linestring = """
TEST TEST HELLO
TEST TEST PIZZA
TEST TEST WORLD
TEST TEST 
"""

print(linestring)

m = re.search('HELLO(.*)WORLD', linestring)
print(m.group(1)) 

Upvotes: 1

Views: 77

Answers (3)

James Wang
James Wang

Reputation: 1311

Jerry beat me to it. The re.DOTALL would make your regex do what you think it should do (match across lines).

If you're new in general to playing around with regex (or just regex in Python), I'd suggest using an interactive website like http://www.pythonregex.com/

If you plugged in your regex and search string without DOTALL:

>>> regex = re.compile("HELLO(.*)WORLD")
>>> r = regex.search(string)
# No match was found:
>>> r

# Run findall
>>> regex.findall(string)
[]

On the other hand, if you checked the DOTALL option, you'd see:

>>> regex = re.compile("HELLO(.*)WORLD",re.DOTALL)
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x60c32711bbd32580>
>>> regex.match(string)
None

# List the groups found
>>> r.groups()
(u'\nTEST TEST PIZZA\nTEST TEST ',)

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)
[u'\nTEST TEST PIZZA\nTEST TEST ']

It's helpful for also understanding what the different re functions give you, which isn't necessarily intuitive when you first start using them.

Upvotes: 1

roippi
roippi

Reputation: 25974

You need to set re.DOTALL flag in order for . to match newline characters.

re.search('HELLO(.*)WORLD', linestring) is None
Out[13]: True

re.search('HELLO(.*)WORLD', linestring, re.DOTALL) is None
Out[14]: False

Upvotes: 1

Jerry
Jerry

Reputation: 71578

You can use:

m = re.search(r'HELLO([\s\S]*?)WORLD', linestring)

That's because . by default doesn't match newlines. Or use the re.DOTALL flag:

m = re.search(r'HELLO(.*?)WORLD', linestring, re.DOTALL)

Which causes . to match newlines.

Note, I used a lazy quantifier instead just so if there's:

linestring = """
TEST TEST HELLO
TEST TEST PIZZA
TEST TEST WORLD
TEST TEST WORLD
"""

The result would be for the whole match:

HELLO
TEST TEST PIZZA
TEST TEST WORLD

Instead of:

HELLO
TEST TEST PIZZA
TEST TEST WORLD
TEST TEST WORLD

You can also use (?s) for the dotall flag by the way:

m = re.search(r'(?s)HELLO(.*?)WORLD', linestring)

Upvotes: 4

Related Questions