Reputation: 885
Somehow puzzled by the way regular expressions work in python, I am looking to replace all commas inside strings that are preceded by a letter and followed either by a letter or a whitespace. For example:
2015,1674,240/09,PEOPLE V. MICHAEL JORDAN,15,15
2015,2135,602832/09,DOYLE V ICON, LLC,15,15
The first line has effectively 6 columns, while the second line has 7 columns. Thus I am trying to replace the comma between (N, L) in the second line by a whitespace (N L) as so:
2015,2135,602832/09,DOYLE V ICON LLC,15,15
This is what I have tried so far, without success however:
new_text = re.sub(r'([\w],[\s\w|\w])', "", text)
Any ideas where I am wrong?
Help would be much appreciated!
Upvotes: 3
Views: 3385
Reputation: 4504
\w
matches a-z
,A-Z
and 0-9
, so your regex will replace all commas. You could try the following regex, and replace with \1\2
.
([a-zA-Z]),(\s|[a-zA-Z])
Upvotes: 0
Reputation: 626747
The pattern you use, ([\w],[\s\w|\w])
, is consuming a word char (= an alphanumeric or an underscore, [\w]
) before a ,
, then matches the comma, and then matches (and again, consumes) 1 character - a whitespace, a word character, or a literal |
(as inside the character class, the pipe character is considered a literal pipe symbol, not alternation operator).
So, the main problem is that \w
matches both letters and digits.
You can actually leverage lookarounds:
(?<=[a-zA-Z]),(?=[a-zA-Z\s])
See the regex demo
The (?<=[a-zA-Z])
is a positive lookbehind that requires a letter to be right before the ,
and (?=[a-zA-Z\s])
is a positive lookahead that requires a letter or whitespace to be present right after the comma.
Here is a Python demo:
import re
p = re.compile(r'(?<=[a-zA-Z]),(?=[a-zA-Z\s])')
test_str = "2015,1674,240/09,PEOPLE V. MICHAEL JORDAN,15,15\n2015,2135,602832/09,DOYLE V ICON, LLC,15,15"
result = p.sub("", test_str)
print(result)
If you still want to use \w
, you can exclude digits and underscore from it using an opposite class \W
inside a negated character class:
(?<=[^\W\d_]),(?=[^\W\d_]|\s)
Upvotes: 7