Layray
Layray

Reputation: 115

ignoring newline character in regex match

I am trying to replace all matching occurrences with title cases using the following script. When there is a newline character between filter words (in this case 'ABC' and 'DEF') that line doesn't get replaced as intended.

How can I ignore the newline character in this case?

Edit: I don't want to strip all newline characters entirely from the string, but only strip those between the filter words.

Edit2: I edited the text and script to better reflect on the issue I am experiencing. If I include flags=re.DOTALL argument, it will give me:

  mmm    = "Hello Hello Hello Hello Hello Hello
              Hello Hello Hello Hello",
  Bbb   = "Bbb",

whereas the output I want is (notice that bbb is not capitalized):

  mmm    = "Hello Hello Hello Hello Hello Hello
              Hello Hello Hello Hello",
  bbb   = "bbb",

The following is the script I am using.

test_string = '''
  mmm    = "hello hello hello hello hello hello
              hello hello hello hello",
  bbb   = "bbb",
'''

rex = r'(?<= mmm)(.*)(?=\")'

def maketitle(match_obj):
    return match_obj.group(0).title()

formatted = re.sub(rex, maketitle, test_string, flags=re.DOTALL)

print(formatted)

Upvotes: 7

Views: 22236

Answers (2)

Pierre-Antoine
Pierre-Antoine

Reputation: 2094

The following code gives the result you expect:

test_string = '''
  mmm    = "hello hello hello hello hello hello
              hello hello hello hello",
  bbb   = "bbb",
'''

rex = r'(?<= mmm)\s*=\s*"[^"]*'

def maketitle(match_obj):
    return match_obj.group(0).title()

formatted = re.sub(rex, maketitle, test_string)

print(formatted)

I'm assuming that the value you want to "title-case" is always between double quotes, and that it can not contain a double-quote (escaped in some way). Handling escaping would be possible with a slightly more complex regex, though.

Upvotes: 2

Mikhail Burshteyn
Mikhail Burshteyn

Reputation: 5002

Use the re.DOTALL flag:

formatted = re.sub(rex, maketitle, string, flags=re.DOTALL)
print(formatted)

According to the docs:

re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

Upvotes: 24

Related Questions