Reputation: 1151
I'm trying to segment a paragraph to sentences. I selected '.', '?' and '!' as the segmentation symbols. I tried:
format = r'((! )|(. )|(? ))'
delimiter = re.compile(format)
s = delimiter.split(line)
but it gives me sre_constants.error: unexpected end of pattern
I also tried
format = [r'(! )',r'(? )',r'(. )']
delimiter = re.compile(r'|'.join(format))
it also causes error.
What's wrong with my method?
Upvotes: 2
Views: 703
Reputation: 9644
.
(wildcard) and ?
(zero or one, quantifier) are special regex characters, you need to escape them to use them literally.
However, in your case it would be much simpler to use a character class (inside which these characters aren't special anymore):
split(r'[!.?] ')
A character class [...]
stands for "one character, any of the ones included inside the character class".
Upvotes: 6