Reputation: 1111
I am trying to split on the following delimiters: full stop, semi-colon, *, +, ? and - However, I want to only split on the '-' when it occurs at the beginning of a sentence (so as to not split words like "non-functional"
I tried the following but I am not making any progress, any help will be appreciated:
sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)
here is the sample text I've been trying this on:
- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon
* See this case mis-alignment
Expected output after the split is a list of items:
TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon, See this case mis-alignment
Upvotes: 0
Views: 71
Reputation: 10951
if you want to split your string in a defined set of delimiters than do it this way:
>>> txt = '- Text Editor: Now you can edit plain text files with airport tools'
>>> r = re.split(r'([.;*+?-]+)',txt)
>>> r
['', '-', ' Text Editor: Now you can edit plain text files with airport tools']
If you don't want to get in the resulting list those delimiters than:
>>> r = re.split(r'[.;*+?-]+',txt)
>>> r
['', ' Text Editor: Now you can edit plain text files with airport tools']
EDIT: in response to your below comment, use \s
for spaces:
>>> txt = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon'''
>>> r = re.split('(^|\s)+[.;*+?-]+($|\s)+',txt)
>>> [i for i in r if len(i) > 1]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\n stability', 'New icon']
Upvotes: 1
Reputation: 174706
You may use this re.split
function.
>>> import re
>>> s = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon'''
>>> [i for i in re.split(r'(?m)\s*^[-*+?]+\s*', s) if i]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\nstability', 'New icon']
Upvotes: 1
Reputation: 980
Try enumerating your delimiters like this:
re.split("[.;*+?] ")
Upvotes: 1