Augustin Pan
Augustin Pan

Reputation: 147

How to use Regex to split a period before non-number?

Suppose I have a string

line = "Apple is $3.00 higher than banana. Banana is cheap.I hate apple."

I know normally line.split(".") can split most sentences. However, in this case, I only want to split at the . if there is no number after it. Normally, I may do re.split('\.[^0-9]', line), but I don't want to lose my first character after splitting.

Here is the output of re.split('\.[^0-9]', line):

['Apple is $3.00 higher than banana', 'Banana is cheap', ' hate apple.']

We can see the I is dropped.

Upvotes: 2

Views: 864

Answers (2)

138
138

Reputation: 574

To perform a non-capturing match you can use positive and negative lookaheads. These will peek at the next matched pattern without including that pattern in the result. Below is an example of a positive lookahead that will capture all substrings of patA followed by patB.
Note patA and patB are regex pattern "variables"

patA(?=patB) 

Below is an example of a negative lookahead that will capture all substrings of patA NOT followed by patB.

patA(?!patB) 

In your case all decimal points not followed by a number should be denoted as

\.(?![0-9])

Upvotes: 4

Andriy Makukha
Andriy Makukha

Reputation: 8344

Negative lookahead:

re.split('\.(?![0-9])', line)

Result:

['Apple is $3.00 higher than banana', ' Banana is cheap', 'I hate apple', '']

Since I stands directly after dot, your regular expression '\.[^0-9]' will remove I because .I will match as a separator, and all separators are removed.

Upvotes: 2

Related Questions