Lance
Lance

Reputation: 741

set multiple columns derived from another column based on condition

Can someone please help. No matter what I do I always get some kind of length mismatch error when trying to set multiple columns.

These 2 lines work:

df.loc[(condition), ['column1', 'column2']] = 10, 20
df.loc[(condition), ['column1', 'column2']] = df['column3'] + 10

This gives error Must have equal len keys and value when setting with an ndarray

df.loc[(condition), ['column1', 'column2']] = df['column3'] + 10, df['column3'] - 10

Doesn't make sense to me because len(column1) = len(column2) and len(df['column3'] + 10) = len(df['column3'] - 10)

Upvotes: 1

Views: 494

Answers (1)

piterbarg
piterbarg

Reputation: 8219

There are two things at least that are going on

One is -- what is condition? if it selects only some rows on the left hand side, there will be a length mismatch

even if the lengths are the same, on the left you have an array of size (N,2) (where N is the number of rows) and on the right you have a tuple of two arrays so (2,N). Pandas or numpy does not know how to broadcast them into the same shape

The easiest I think is to go column by column. But the closest I could get to your syntax that works is

df.loc[(condition), ['column1', 'column2']] = df[['column3']].values + np.array([10,20])

Here numpy broadcast rules kick in on the rhs getting it into the right shape

Upvotes: 1

Related Questions