Reputation: 678
I would like to use assign() to create new columns by method chaining (which is an elegant way of expressing a number of operations on a dataframe), however I can’t seem to find a way to do this without creating a copy which is much slower than doing it in place, due to the associated memory allocation. It it possible to do this in place with a simple method that modifies in-place and returns the resulting dataframe?
For example:
df = pd.DataFrame(np.random.randn(5,2), columns=['a', 'b'])
df['c']=df.a+df.b # in place, fast, but cannot chain
df.sum() # ….takes two lines of code
df.assign(c=df.a+df.b).sum() # compact but MUCH slower as assign() returns a copy of the df rather than assigning in place
Upvotes: 1
Views: 2810
Reputation: 68146
.assign
can take a callable that will accept the current state of the dataframe within a chain.
df = (
pd.DataFrame(np.random.randn(5,2), columns=['a', 'b'])
.assign(c=lambda df: df["a"] + df["b"])
.sum()
)
Upvotes: 3