Reputation: 3852
Sometimes, I would manipulate some columns of the dataframe and re-change it.
For example, one dataframe df has 6 columns like this:
A, B1, B2, B3, C, D
And I want to change the values in the columns (B1,B2,B3) transform into (B1*A, B2*A, B3*A).
Aside the loop subroutine which is slow, the df.filter(like = 'B')
will accelerate a lot.
df.filter(like = "B").mul(df.A, axis = 0)
can produce the right answer. But I can't change the B-like columns in df
using:
df.filter(like = "B") =
df.filter(like = "B").mul(df.A. axis = 0)`
How to achieve it? I know using pd.concat
to creat a new dataframe can get it done. But when the number of columns are huge, this method may be loss of efficiency. What I want to do is to assign new value to the columns already exist.
Any advices would be appreciate!
Upvotes: 1
Views: 830
Reputation: 863166
Use str.contains
with boolean indexing
:
cols = df.columns[df.columns.str.contains('B')]
df[cols] = df[cols].mul(df.A, axis = 0)
Sample:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3],
'B1':[4,5,6],
'B2':[7,8,9],
'B3':[1,3,5],
'C':[5,3,6],
'D':[7,4,3]})
print (df)
A B1 B2 B3 C D
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
cols = df.columns[df.columns.str.contains('B')]
print (cols)
Index(['B1', 'B2', 'B3'], dtype='object')
df[cols] = df[cols].mul(df.A, axis = 0)
print (df)
A B1 B2 B3 C D
0 1 4 7 1 5 7
1 2 10 16 6 3 4
2 3 18 27 15 6 3
Timings:
len(df)=3
:
In [17]: %timeit (a(df))
1000 loops, best of 3: 1.36 ms per loop
In [18]: %timeit (b(df1))
100 loops, best of 3: 2.39 ms per loop
len(df)=30k
:
In [14]: %timeit (a(df))
100 loops, best of 3: 2.89 ms per loop
In [15]: %timeit (b(df1))
100 loops, best of 3: 4.71 ms per loop
Code:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3],
'B1':[4,5,6],
'B2':[7,8,9],
'B3':[1,3,5],
'C':[5,3,6],
'D':[7,4,3]})
print (df)
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
def a(df):
cols = df.columns[df.columns.str.contains('B')]
df[cols] = df[cols].mul(df.A, axis = 0)
return (df)
def b(df):
df.loc[:, df.filter(regex=r'^B').columns] = df.loc[:, df.filter(regex=r'^B').columns].mul(df.A, axis=0)
return (df)
print (a(df))
print (b(df1))
Upvotes: 1
Reputation: 210882
you have almost done it:
In [136]: df.loc[:, df.filter(regex=r'^B').columns] = df.loc[:, df.filter(regex=r'^B').columns].mul(df.A, axis=0)
In [137]: df
Out[137]:
A B1 B2 B3 B4 F
0 1 4 7 1 5 7
1 2 10 16 6 6 4
2 3 18 27 15 18 3
Upvotes: 1