Peter Chen
Peter Chen

Reputation: 67

using regular expression to split string in python

I use

re.compile(r"(.+?)\1+").findall('44442(2)2(2)44')

can get

['4','2(2)','4']

, but how can I get

['4444','2(2)2(2)','44']

by using regular expression?

Thanks

Upvotes: 3

Views: 91

Answers (3)

user6732794
user6732794

Reputation:

No change to your pattern needed. Just need to use to right function for the job. re.findall will return a list of groups if there are capturing groups in the pattern. To get the entire match, use re.finditer instead, so that you can extract the full match from each actual match object.

pattern = re.compile(r"(.+?)\1+")
[match.group(0) for match in pattern.finditer('44442(2)2(2)44')]

Upvotes: 4

Amadan
Amadan

Reputation: 198314

With minimal change to OP's regular expression:

[m[0] for m in re.compile(r"((.+?)\2+)").findall('44442(2)2(2)44')]

findall will give you the full match if there are no groups, or groups if there are some. So given that you need groups for your regexp to work, we simply add another group to encompass the full match, and extract it afterwards.

Upvotes: 3

heemayl
heemayl

Reputation: 41987

You can do:

[i[0] for i in re.findall(r'((\d)(?:[()]*\2*[()]*)*)', s)]

Here the Regex is:

((\d)(?:[()]*\2*[()]*)*)

which will output a list of tuples containing the two captured groups, and we are only interest din the first one hence i[0].

Example:

In [15]: s
Out[15]: '44442(2)2(2)44'

In [16]: [i[0] for i in re.findall(r'((\d)(?:[()]*\2*[()]*)*)', s)]
Out[16]: ['4444', '2(2)2(2)', '44']

Upvotes: 0

Related Questions