prasadav
prasadav

Reputation: 180

Pythonic way of applying regex to all columns of dataframe

I have a dataframe containing keywords and value in all columns. See the example below.

Input DataFrame

I want to apply regex to all the columns. So I use for loop and apply the regex:

for i in range (1,maxExtended_Keywords):
    temp = 'extdkey_' + str(i)
    Extended_Keywords[temp] = Extended_Keywords[temp].str.extract(":(.*)",expand=True)

And I get the desired final result. No issues there.

Desired output

However, just curios is there a pythonic way to apply regex to entire dataframe instead of using for loop and applying to column wise.

Thanks,

Upvotes: 8

Views: 4319

Answers (2)

romulomadu
romulomadu

Reputation: 657

You can use applymap, it will apply some function for each element in dataframe, for this problem you can do this:

func = lambda x: re.findall('^.*:\s*(.*)', x)[0] if re.findall('^.*:\s*(.*)', str(x)) else x
df.applymap(func)

Caution: Avoid to use applymap for huge dataframes due to efficiency issue.

Upvotes: 1

piRSquared
piRSquared

Reputation: 294218

Use pandas.DataFrame.replace with regex=True

df.replace('^.*:\s*(.*)', r'\1', regex=True)

Notice that my pattern uses parentheses to capture the part after the ':' and uses a raw string r'\1' to reference that capture group.


MCVE

df = pd.DataFrame([
    [np.nan, 'thing1: hello'],
    ['thing2: world', np.nan]
], columns=['extdkey1', 'extdkey2'])

df

        extdkey1       extdkey2
0            NaN  thing1: hello
1  thing2: world            NaN

df.replace('^.*:\s*(.*)', r'\1', regex=True)

  extdkey1 extdkey2
0      NaN    hello
1    world      NaN

Upvotes: 8

Related Questions