Tunji
Tunji

Reputation: 31

Regex construction to obtain sentences from a text - Python

A sentence will be a sequence of characters that:

  1. is terminated by (but does not include) the characters ! ? . or the end of the file
  2. excludes whitespace on either end, and
  3. is not empty

I have a file that contains the following text:

this is the\nfirst sentence. Isn't\nit? Yes ! !! This \n\nlast bit :) is also a sentence, but \nwithout a terminator other than the end of the file\n

By the above definition, there are four "sentences" in it:

Notice that:

This is what I have at the moment (.*\n+) and don't know how to refine it.

Please I need your help for a regex that deconstruct the text into above and return a list. Thanking you in advance for your help.

Upvotes: 1

Views: 76

Answers (2)

Shenglin Chen
Shenglin Chen

Reputation: 4554

Try this:

re.split('(\!\s\!+)|\.|\?',s)
['this is the\nfirst sentence', " Isn't\nit", ' Yes ', ' This \n\nlast bit :) is also a sentence, but \nwithout a terminator other than the end of the file\n']

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

The following is not something that will work for everyone, but it works for your specific input. You may further tweak this expression:

([^!?.]+)[!?.\s]*(?![!?.])

See the regex demo.

Details:

  • ([^!?.]+) - Capturing group 1 matching 1 or more chars other than !, ?, .
  • [!?.\s]* - 0 or more !, ?, ., whitespaces
  • (?![!?.]) - that are not followed with !, ? or ..

In Python, you need to use it with re.findall that will only fetch the substrings captured with capturing groups:

import re
rx = r"([^!?.]+)[!?.\s]*(?![!?.])"
s = "this is the\nfirst sentence. Isn't\nit? Yes ! !! This \n\nlast bit :) is also a sentence, but \nwithout a terminator other than the end of the file\n"
sents = re.findall(rx, s)
print(sents)
# => ['this is the\nfirst sentence', 
      "Isn't\nit", 
      'Yes ', 
      'This \n\nlast bit :) is also a sentence, but \nwithout a terminator other than the end of the file\n'
     ]

See Python demo

Upvotes: 1

Related Questions