pistacchio
pistacchio

Reputation: 58883

Python regex to split at word starting with

I know how to search for a word and split a string by it. Example:

s = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua'
r = re.compile('(elit)')  
r.split(s)
# => ['Lorem ipsum dolor sit amet, consectetur adipisicing ', 'elit', ', sed do eiusmod tempor incididunt ut labore et dolore magna aliqua']

How can I do the same, but only knowing the beginning of a word? For example, I'd like to split the string by "consect*" and having it split at the match of "consectetur". Thanks

Upvotes: 3

Views: 1016

Answers (3)

satomacoto
satomacoto

Reputation: 11963

use \w: Alphanumeric characters plus "_" [A-Za-z0-9_]

r = re.compile('(consect\w*)')

or use \S: Non-whitespace characters [^ \t\r\n\v\f]

r = re.compile('(consect\S*)')

Upvotes: 1

NPE
NPE

Reputation: 500367

Simply use (consect\w*) as the regex:

In [3]: import re

In [4]: s = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua'

In [5]: r = re.compile(r'(consect\w*)')  

In [6]: r.split(s)
Out[6]: 
['Lorem ipsum dolor sit amet, ',
 'consectetur',
 ' adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua']

The \w* matches any sequence of alphanumeric characters. You could replace the \w with a different character class if your requirements are different.

For further details on Python regular expressions, see Regular Expression Syntax.

Upvotes: 1

sverre
sverre

Reputation: 6919

Use \w to match any word character, or [A-Za-z] if you want only ASCII alpabetic characters.

r = re.compile('(consect\w*)')

Upvotes: 3

Related Questions