Reputation: 180
I have a dataframe containing keywords and value in all columns. See the example below.
I want to apply regex to all the columns. So I use for loop and apply the regex:
for i in range (1,maxExtended_Keywords):
temp = 'extdkey_' + str(i)
Extended_Keywords[temp] = Extended_Keywords[temp].str.extract(":(.*)",expand=True)
And I get the desired final result. No issues there.
However, just curios is there a pythonic way to apply regex to entire dataframe instead of using for loop and applying to column wise.
Thanks,
Upvotes: 8
Views: 4319
Reputation: 657
You can use applymap, it will apply some function for each element in dataframe, for this problem you can do this:
func = lambda x: re.findall('^.*:\s*(.*)', x)[0] if re.findall('^.*:\s*(.*)', str(x)) else x
df.applymap(func)
Caution: Avoid to use applymap for huge dataframes due to efficiency issue.
Upvotes: 1
Reputation: 294218
Use pandas.DataFrame.replace
with regex=True
df.replace('^.*:\s*(.*)', r'\1', regex=True)
Notice that my pattern uses parentheses to capture the part after the ':'
and uses a raw string r'\1'
to reference that capture group.
df = pd.DataFrame([
[np.nan, 'thing1: hello'],
['thing2: world', np.nan]
], columns=['extdkey1', 'extdkey2'])
df
extdkey1 extdkey2
0 NaN thing1: hello
1 thing2: world NaN
df.replace('^.*:\s*(.*)', r'\1', regex=True)
extdkey1 extdkey2
0 NaN hello
1 world NaN
Upvotes: 8