ScarletPumpernickel
ScarletPumpernickel

Reputation: 678

Assign to Pandas dataframe in place with method chaining

I would like to use assign() to create new columns by method chaining (which is an elegant way of expressing a number of operations on a dataframe), however I can’t seem to find a way to do this without creating a copy which is much slower than doing it in place, due to the associated memory allocation. It it possible to do this in place with a simple method that modifies in-place and returns the resulting dataframe?

For example:

df = pd.DataFrame(np.random.randn(5,2), columns=['a', 'b'])

df['c']=df.a+df.b # in place, fast, but cannot chain
df.sum() # ….takes two lines of code

df.assign(c=df.a+df.b).sum() # compact but MUCH slower as assign() returns a copy of the df rather than assigning in place

Upvotes: 1

Views: 2810

Answers (1)

Paul H
Paul H

Reputation: 68146

.assign can take a callable that will accept the current state of the dataframe within a chain.


df = (
    pd.DataFrame(np.random.randn(5,2), columns=['a', 'b'])
      .assign(c=lambda df: df["a"] + df["b"])
      .sum()
)

Upvotes: 3

Related Questions