Andrew Voorhees
Andrew Voorhees

Reputation: 23

Python regex for everything up to a blank line

I'm looking for a way regex that will get me everything in a piece of text up to the first blank line. I have the following:

reg = r'((Opposition|Oppose):?\s*)(.*?)\n\n'
str1 = """Opposition


          California Attorneys for Criminal Justice
          Californians for Safety and Justice
          Drug Policy Alliance
          Friends Committee on Legislation of California
          Legal Services for Prisoners with Children


           Analysis Prepared  
"""

str2 = """Oppose:   None received

                                      -- END --

                                      """

When I run:

match  = re.search(reg, str1, re.DOTALL)
print ma
tch.group(3)

I get:

      California Attorneys for Criminal Justice
      Californians for Safety and Justice
      Drug Policy Alliance
      Friends Committee on Legislation of California
      Legal Services for Prisoners with Children

But when I run:

match = re.search(reg, str2, re.DOTALL)
print match.group(3)

I get:

   None received
                                      -- END --

The the outcome for the first string is correct, but what I want from the second string is just the "None received". I can't come up with a good explanation for why I get the "-- END --" as well. Shouldn't my regex match the \n after "None received" as well as the \n on the blank line and stop? Any help would be appreciated

Upvotes: 2

Views: 1719

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627101

You can make sure you match whitespace-only lines with [^\S\n]* (= match 0 or more characters other than non-whitespace or newlines):

((Oppos(?:e|ition)):?\s*)(.*?)\n[^\S\n]*\n[^\S\n]*

See demo

I also shortened the 2nd capture group a bit.

Here is an IDEONE demo

Upvotes: 1

Related Questions