Martin
Martin

Reputation: 53

How to exclude floating numbers from pythons's regular expressions that splits on dot

I'm new to regular expressions. I'm trying to split a string in python when it encounters . or ! or ? or \n

re.split('\?|\.|\!|\n', input_string)
  1. however, this will split the string on floating points too. how to avoid that?
  2. how to keep the splitted char in the result set?

here is an example:

input_string = "hi I need help with re. I have 99.9% chance to get help here!"
output = ["hi I need help with re.", "I have 99.9% chance to get help here!"]

Upvotes: 1

Views: 509

Answers (1)

anubhava
anubhava

Reputation: 785581

  1. You should be using findall instead of split since you want matches before a set of characters.
  2. Use lookbehind and lookahead around dot to avoid matching dot of a floating point number

Regex:

\S.*?(?:[?!\n]|(?<!\d)\.(?!\d))

RegEx Demo

RegEx Details:

  • \S: Match a non-whitespace character
  • .*?: Match 0 or more of any characters
  • (?:: Start non-capture group
    • [?!\n]: Match one of these characters inside [...]
    • |: OR
    • (?<!\d)\.(?!\d): Match a dot if it not preceded and followed by a digit
  • ): End non-capture group

Code:

import re

input_string = "hi I need help with re. I have 99.9% chance to get help here!"
print ( re.findall('\S.*?(?:[?!\n]|(?<!\d)\.(?!\d))', input_string) )

Output:

['hi I need help with re.', 'I have 99.9% chance to get help here!']

Upvotes: 2

Related Questions