newdev14
newdev14

Reputation: 1111

Regex matching multiple delimiters

I am trying to split on the following delimiters: full stop, semi-colon, *, +, ? and - However, I want to only split on the '-' when it occurs at the beginning of a sentence (so as to not split words like "non-functional"

I tried the following but I am not making any progress, any help will be appreciated:

sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)

here is the sample text I've been trying this on:

- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon                                                                          
* See this case mis-alignment

Expected output after the split is a list of items:

TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon, See this case mis-alignment

Upvotes: 0

Views: 71

Answers (3)

Iron Fist
Iron Fist

Reputation: 10951

if you want to split your string in a defined set of delimiters than do it this way:

>>> txt = '- Text Editor: Now you can edit plain text files with airport tools'
>>> r = re.split(r'([.;*+?-]+)',txt)
>>> r
['', '-', ' Text Editor: Now you can edit plain text files with airport tools']

If you don't want to get in the resulting list those delimiters than:

>>> r = re.split(r'[.;*+?-]+',txt)
>>> r
['', ' Text Editor: Now you can edit plain text files with airport tools']

EDIT: in response to your below comment, use \s for spaces:

    >>> txt = '''- Text Editor: Now you can edit plain text files with airport tools
    * Updated Dropbox support 
    * Improved
    stability
    - New icon'''
     >>> r = re.split('(^|\s)+[.;*+?-]+($|\s)+',txt) 
     >>> [i for i in r if len(i) > 1]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\n    stability', 'New icon']

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174706

You may use this re.split function.

>>> import re
>>> s = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon'''
>>> [i for i in re.split(r'(?m)\s*^[-*+?]+\s*', s) if i]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\nstability', 'New icon']

Upvotes: 1

Bitsplitter
Bitsplitter

Reputation: 980

Try enumerating your delimiters like this:

re.split("[.;*+?] ")

Upvotes: 1

Related Questions