Reputation: 58883
I know how to search for a word and split a string by it. Example:
s = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua'
r = re.compile('(elit)')
r.split(s)
# => ['Lorem ipsum dolor sit amet, consectetur adipisicing ', 'elit', ', sed do eiusmod tempor incididunt ut labore et dolore magna aliqua']
How can I do the same, but only knowing the beginning of a word? For example, I'd like to split the string by "consect*" and having it split at the match of "consectetur". Thanks
Upvotes: 3
Views: 1016
Reputation: 11963
use \w
: Alphanumeric characters plus "_" [A-Za-z0-9_]
r = re.compile('(consect\w*)')
or use \S
: Non-whitespace characters [^ \t\r\n\v\f]
r = re.compile('(consect\S*)')
Upvotes: 1
Reputation: 500367
Simply use (consect\w*)
as the regex:
In [3]: import re
In [4]: s = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua'
In [5]: r = re.compile(r'(consect\w*)')
In [6]: r.split(s)
Out[6]:
['Lorem ipsum dolor sit amet, ',
'consectetur',
' adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua']
The \w*
matches any sequence of alphanumeric characters. You could replace the \w
with a different character class if your requirements are different.
For further details on Python regular expressions, see Regular Expression Syntax.
Upvotes: 1
Reputation: 6919
Use \w
to match any word character, or [A-Za-z]
if you want only ASCII alpabetic characters.
r = re.compile('(consect\w*)')
Upvotes: 3