Han Zhengzu
Han Zhengzu

Reputation: 3852

Replace certain column with `filter(like = "")` in Pandas

Sometimes, I would manipulate some columns of the dataframe and re-change it.

For example, one dataframe df has 6 columns like this:

A, B1, B2, B3, C, D

And I want to change the values in the columns (B1,B2,B3) transform into (B1*A, B2*A, B3*A).
Aside the loop subroutine which is slow, the df.filter(like = 'B') will accelerate a lot.

df.filter(like = "B").mul(df.A, axis = 0) can produce the right answer. But I can't change the B-like columns in df using:
df.filter(like = "B") =df.filter(like = "B").mul(df.A. axis = 0)`

How to achieve it? I know using pd.concat to creat a new dataframe can get it done. But when the number of columns are huge, this method may be loss of efficiency. What I want to do is to assign new value to the columns already exist.

Any advices would be appreciate!

Upvotes: 1

Views: 830

Answers (2)

jezrael
jezrael

Reputation: 863166

Use str.contains with boolean indexing:

cols = df.columns[df.columns.str.contains('B')]
df[cols] = df[cols].mul(df.A, axis = 0)

Sample:

import pandas as pd

df = pd.DataFrame({'A':[1,2,3],
                   'B1':[4,5,6],
                   'B2':[7,8,9],
                   'B3':[1,3,5],
                   'C':[5,3,6],
                   'D':[7,4,3]})

print (df)
   A  B1  B2  B3  C  D
0  1   4   7   1  5  7
1  2   5   8   3  3  4
2  3   6   9   5  6  3

cols = df.columns[df.columns.str.contains('B')]
print (cols)
Index(['B1', 'B2', 'B3'], dtype='object')

df[cols] = df[cols].mul(df.A, axis = 0)

print (df)
   A  B1  B2  B3  C  D
0  1   4   7   1  5  7
1  2  10  16   6  3  4
2  3  18  27  15  6  3

Timings:

len(df)=3:

In [17]: %timeit (a(df))
1000 loops, best of 3: 1.36 ms per loop

In [18]: %timeit (b(df1))
100 loops, best of 3: 2.39 ms per loop

len(df)=30k:

In [14]: %timeit (a(df))
100 loops, best of 3: 2.89 ms per loop

In [15]: %timeit (b(df1))
100 loops, best of 3: 4.71 ms per loop

Code:

import pandas as pd

df = pd.DataFrame({'A':[1,2,3],
                   'B1':[4,5,6],
                   'B2':[7,8,9],
                   'B3':[1,3,5],
                   'C':[5,3,6],
                   'D':[7,4,3]})

print (df)
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()

def a(df):
    cols = df.columns[df.columns.str.contains('B')]
    df[cols] = df[cols].mul(df.A, axis = 0)
    return (df)


def b(df):
    df.loc[:, df.filter(regex=r'^B').columns] = df.loc[:, df.filter(regex=r'^B').columns].mul(df.A, axis=0)    
    return (df)

print (a(df))
print (b(df1))

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

you have almost done it:

In [136]: df.loc[:, df.filter(regex=r'^B').columns] = df.loc[:, df.filter(regex=r'^B').columns].mul(df.A, axis=0)

In [137]: df
Out[137]:
   A  B1  B2  B3  B4  F
0  1   4   7   1   5  7
1  2  10  16   6   6  4
2  3  18  27  15  18  3

Upvotes: 1

Related Questions