Michael K
Michael K

Reputation: 439

Python regex negative lookbehind including start of line

consider the following input:

"aaa"|"bbb"|"123"|"!"\\"|"2010-01-04T00:00:01"

I am trying to write a regex that will capture and replace the double quote character with tilde if...

In PHP I am able to get the regex pictured below working... php_regex

Due to constraints on the python regex, the same regex fails with the following error:

re.error: look-behind requires fixed-width pattern

my python code is as follows:

import re
orig_line = r'"aaa"|"bbb"|"123"|"!"\\"|"2010-01-04T00:00:01"'
new_line = re.sub(pattern='(?<!\||^)\"(?!\||$)',repl='~',string=orig_line)

How can I adjust this regex so it works in python?

Similar questions exist on SO, but I couldn't find any that address the start/end of line requirement.

Upvotes: 2

Views: 752

Answers (2)

Daweo
Daweo

Reputation: 36380

I would approach it following way: as you are interested in " which is not at start we can express it as having one non-newline before i.e. using positive lookbehind that is:

import re
orig_line = r'"aaa"|"bbb"|"123"|"!"\\"|"2010-01-04T00:00:01"'
new_line = re.sub(pattern='(?<=.)(?<!\|)\"(?!\||$)',repl='~',string=orig_line)
print(new_line)

output:

"aaa"|"bbb"|"123"|"!~\\"|"2010-01-04T00:00:01"

If you are not limited to python standard library I suggest trying regex which does support variable-length lookbehinds for example:

import regex as re
text = "a1aa2aaa3aaaa4"
print(re.findall('(?<=a{3,})[0-9]', text))

output:

['3', '4']

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You can use

(?<=[^|])

The (?<=[^|]) matches a location that is immediately preceded with any char but | and thus it cannot match at the start of the string.

See the Python demo:

import re
orig_line = '"aaa"|"bbb"|"123"|"!"\\"|"2010-01-04T00:00:01"'
new_line = re.sub(r'(?<=[^|])"(?=[^|])', '~', orig_line)
print(new_line) # => "aaa"|"bbb"|"123"|"!~\"|"2010-01-04T00:00:01"

Upvotes: 1

Related Questions