eamon1234
eamon1234

Reputation: 1585

Python re.compile between two html tags

This should be quite straightforward but I can't quite twig it. I want to get the name from this html string:

  soup =   </ul>
  Brian
  <p class="f">

I've tried:

namePattern = re.compile(r'(?<=</ul>)(.*?)(?<=<p)')
rev.reviewerName = re.findall(namePattern,  str(soup))

and

namePattern = re.compile(r'</ul>(.*?)<p')

Can you tell me how to do it? Thanks.

Upvotes: 1

Views: 1310

Answers (1)

NPE
NPE

Reputation: 500853

By default, . doesn't match newlines. You need to specify re.DOTALL as the second argument to re.compile().

Note that this will include the newlines as part of your capture group. If you don't want that, you can explicitly match them with \s*:

In [5]: re.findall(r'</ul>\s*(.*?)\s*<p', s)
Out[5]: ['Brian']

Upvotes: 3

Related Questions