Reputation: 326
As seen in my previous question
Rename columns regex, keep name if no match
Why is there a different output of the regex?
data = {'First_Column': [1,2,3], 'Second_Column': [1,2,3],
'\First\Mid\LAST.Ending': [1,2,3], 'First1\Mid1\LAST1.Ending': [1,2,3]}
df = pd.DataFrame(data)
First_Column Second_Column \First\Mid\LAST.Ending First1\Mid1\LAST1.Ending
pd.str.extract()
df.columns.str.extract(r'([^\\]+)\.Ending')
0
0 NaN
1 NaN
2 LAST
3 LAST1
re.search()
col = df.columns.tolist()
for i in col[2:]:
print(re.search(r'([^\\]+)\.Ending', i).group())
LAST.Ending
LAST1.Ending
THX
Upvotes: 1
Views: 102
Reputation: 1640
From pandas.Series.str.extract
docs
Extract capture groups in the regex pat as columns in a DataFrame.
It returns the capture group. Whereas, re.search
with group()
or group(0)
returns the whole match, but if you change to group(1)
it will return the capture group 1
.
This will return full match:
for i in col[2:]:
print(re.search(r'([^\\]+)\.Ending', i).group())
LAST.Ending
LAST1.Ending
This will return only the capture group:
for i in col[2:]:
print(re.search(r'([^\\]+)\.Ending', i).group(1))
LAST
LAST1
Further read Link
Upvotes: 1