Reputation: 2324
I have the following Pandas Series:
SC_S193_M7.CONTROLDAY10.EPI.P1_Stem
SC_S194_M7.CONTROLDAY10.EPI.P1_Goblet
SC_S102_M1.CONTROLDAY3.EPI2_Enterocyte
SC_S106_M1.CONTROLDAY3.EPI2_Goblet
I want to use regex to extract the string after the last underscore in each row of this series. I was able to come up with regex that match with the last string but note sure how to implement it in a pandas series method.
The regex I used to match the pattern and replace with the first matching group \1
:
SC_S\d{3}_M\d\.CONTROLDAY\d{1,2}\.EPI\d?(?:\.P\d_|_)
I tried using .replace() as follows but that did not work out:
.replace('SC_S\d{3}_M\d\.CONTROLDAY\d{1,2}\.EPI\d?(?:\.P\d_|_)(\w+)')
Any idea how to use Pandas series method to extract the last string before the underscore or find the matching pattern and replace it with the first group?
Upvotes: 1
Views: 690
Reputation: 5914
Another variant (assuming that s
is your series) that should work is something along the lines of
s.apply(lambda r : re.sub('.*_([^_]*)$', '\\1', r))
Upvotes: 2
Reputation: 210842
I think you can split it instead of using RegEx:
In [170]: s
Out[170]:
0 SC_S193_M7.CONTROLDAY10.EPI.P1_Stem
1 SC_S194_M7.CONTROLDAY10.EPI.P1_Goblet
2 SC_S102_M1.CONTROLDAY3.EPI2_Enterocyte
3 SC_S106_M1.CONTROLDAY3.EPI2_Goblet
Name: 0, dtype: object
In [171]: s.str.split('_').str[-1]
Out[171]:
0 Stem
1 Goblet
2 Enterocyte
3 Goblet
Name: 0, dtype: object
or better using rsplit(..., n=1)
:
In [174]: s.str.rsplit('_', n=1).str[-1]
Out[174]:
0 Stem
1 Goblet
2 Enterocyte
3 Goblet
Name: 0, dtype: object
alternatively you can use .str.extract()
:
In [177]: s.str.extract(r'.*_([^_]*)$', expand=False)
Out[177]:
0 Stem
1 Goblet
2 Enterocyte
3 Goblet
Name: 0, dtype: object
Upvotes: 4