Reputation: 147
Suppose I have a string
line = "Apple is $3.00 higher than banana. Banana is cheap.I hate apple."
I know normally line.split(".")
can split most sentences. However, in this case, I only want to split at the .
if there is no number after it. Normally, I may do re.split('\.[^0-9]', line)
, but I don't want to lose my first character after splitting.
Here is the output of re.split('\.[^0-9]', line)
:
['Apple is $3.00 higher than banana', 'Banana is cheap', ' hate apple.']
We can see the I
is dropped.
Upvotes: 2
Views: 864
Reputation: 574
To perform a non-capturing match you can use positive and negative lookaheads. These will peek at the next matched pattern without including that pattern in the result.
Below is an example of a positive lookahead that will capture all substrings of patA followed by patB.
Note patA and patB are regex pattern "variables"
patA(?=patB)
Below is an example of a negative lookahead that will capture all substrings of patA NOT followed by patB.
patA(?!patB)
In your case all decimal points not followed by a number should be denoted as
\.(?![0-9])
Upvotes: 4
Reputation: 8344
Negative lookahead:
re.split('\.(?![0-9])', line)
Result:
['Apple is $3.00 higher than banana', ' Banana is cheap', 'I hate apple', '']
Since I
stands directly after dot, your regular expression '\.[^0-9]'
will remove I
because .I
will match as a separator, and all separators are removed.
Upvotes: 2