Reputation: 2851
I have a pandas Series
where I have to extract all the substrings within parenthesis. A string might contain multiple such substrings as well as no such substrings as well. How can such a condition be handled
abc(def)ghi(jkl)aaa
jklmnopqr(jkl)
(ab)cde(ghi)
lmnoprst uvwxyz
If I use str.extract
, I can obtain only one substring at a time from a string with a.str.extract('.*\((.*)\)')
. So in effect, I miss the substring def
.
How can this be solved.?
The desired outcome is
def
jkl
ab
ghi
Upvotes: 0
Views: 998
Reputation: 153560
Try:
df[0].str.extractall(r'\((\w+)\)')
Output:
0
match
0 0 def
1 jkl
1 0 jkl
2 0 ab
1 ghi
Upvotes: 2