Reputation: 5666
Based on this solution, I created several masks on a Pandas dataframe to create a new column which should be filled from different columns (based on the conditions).
The (simplified) code looks as follows:
mask0 = (df['condition'] == 1)
mask1 = (df['condition'] == 0)
df.loc[mask0, 'newColumn'] = df['otherColumn1']
df.loc[mask1, 'newColumn'] = df['otherColumn2']
However upon execution of the third line I get the following error:
ValueError: cannot reindex from a duplicate axis
If I simply do this, it works:
df.loc[mask0, 'newColumn'] = 1
What am I doing wrong?
Upvotes: 9
Views: 7866
Reputation: 476557
You need to mask the "data provider" as well:
df.loc[mask0, 'newColumn'] = df[mask0, 'otherColumn1']
df.loc[mask1, 'newColumn'] = df[mask1, 'otherColumn2']
If the first condition is true if the latter is false and vice versa, we can use np.where(..)
:
df['newColumn'] = np.where(mask0, df['otherColumn0'], df['otherColumn2'])
or you can use np.select(..)
in case the two can be both false where we fallback on the old values in case both conditons are False
:
df['newColumn'] = np.select(
[mask0, mask1],
[df['otherColumn1'], df['otherColumn2']],
default=df['newColumn']
)
Here of course we make the assumption that newColumn
already exists in the dataframe (for example through some previous processing).
Upvotes: 3
Reputation: 71560
Another solution with np.where
:
df['newColumn'] = np.where(df['condition'].eq(1), df['otherColumn1'], df['condition'])
df['newColumn'] = np.where(df['condition'].eq(0), df['otherColumn2'], df['condition'])
Upvotes: 2
Reputation: 862431
You have to filter in both sides:
mask0 = (df['condition'] == 1)
mask1 = (df['condition'] == 0)
df.loc[mask0, 'newColumn'] = df.loc[mask0, 'otherColumn1']
df.loc[mask1, 'newColumn'] = df.loc[mask1, 'otherColumn2']
But here is better use numpy.select
for avoid repeating code:
df['newColumn'] = np.select([mask0, mask1],
[df['otherColumn1'], df['otherColumn2']],
default=np.nan)
Upvotes: 17