Reputation: 65
I am trying to subtract (1) year from the column 'yy' in my dataframe IF the month in my 'month' column for that row == 'October', 'November', 'December'.
I've tried a number of methods and am stuck at this point. The loop below makes the entire yy2 column yy-1, rather than just the rows that have October, November, or December in the month column.
yy2 = []
for row in df['month']:
if row != 'October':
yy2 = df['yy']
else:
yy2 = (df['yy'])-1
df['yy2'] = yy2
Upvotes: 1
Views: 161
Reputation: 7058
using Series.where
month_selection = { 'October', 'November', 'December'}
df['yy2'] = df['yy'].where(~df['month'].isin(month_selection ), df['yy'] - 1)
a more naive implementation
df['yy2'] = df['yy'].copy()
sel = df['month'].isin(month_selection )
df.loc[sel, 'yy2'] = df.loc[sel, 'yy'] - 1
Upvotes: 2
Reputation: 114588
As a rule, you should not be using loops with pandas. Everything is already vectorized.
For example, your current loop computes yy2
for the entire column based on a single value at each iteration. This is not what you want presumably. Since your last month is 'October'
, your entire column gets subtracted: the operations df['yy'] - 1
computes the difference for the entire column yy
.
You can use this vectorization to your advantage.
First compute the boolean mask at which you want to subtract:
df = pd.DataFrame({'month': ['September', 'October'], 'yy': [1, 1]})
mask = (df['month'] == 'October')
Since booleans happen to be exactly equal to the integers one and zero, there are a couple of ways you can perform the subtraction. You can do it directly:
df['yy2'] = df['yy'] - mask
The more "correct" way to do it would be to only subtract at the masked location. This relies on the fact that a boolean array can be used as a logical index, or mask, into the column:
df['yy2'] = df['yy']
df.loc[mask, 'yy2'] -= 1
Getting a column like df['yy2']
creates a copy that will not necessarily write back to the original dataframe if it is modified. df.loc[:, 'yy2']
is a view to the original column that will transmit changes.
Upvotes: 0
Reputation: 1004
First rule of pandas: if you're looping over rows, you're doing it wrong.
What you want to do is use apply
:
df['yy2'] = df.apply(lambda x: x['yy'] - 1 if x['Month'] in ['October', 'November', 'December'] else x['yy'], axis=1)
Upvotes: 1