G.G.
G.G.

Reputation: 65

Subtract -1 from Year if month = October, November, or December, Else year python

I am trying to subtract (1) year from the column 'yy' in my dataframe IF the month in my 'month' column for that row == 'October', 'November', 'December'.

I've tried a number of methods and am stuck at this point. The loop below makes the entire yy2 column yy-1, rather than just the rows that have October, November, or December in the month column.

yy2 = []

for row in df['month']:
    if row != 'October':
        yy2 = df['yy']
    else:
        yy2 = (df['yy'])-1

df['yy2'] = yy2

Upvotes: 1

Views: 161

Answers (3)

Maarten Fabré
Maarten Fabré

Reputation: 7058

using Series.where

month_selection = { 'October', 'November', 'December'}
df['yy2'] = df['yy'].where(~df['month'].isin(month_selection ), df['yy'] - 1)

a more naive implementation

df['yy2'] = df['yy'].copy()
sel = df['month'].isin(month_selection )
df.loc[sel, 'yy2'] = df.loc[sel, 'yy'] - 1

Upvotes: 2

Mad Physicist
Mad Physicist

Reputation: 114588

As a rule, you should not be using loops with pandas. Everything is already vectorized.

For example, your current loop computes yy2 for the entire column based on a single value at each iteration. This is not what you want presumably. Since your last month is 'October', your entire column gets subtracted: the operations df['yy'] - 1 computes the difference for the entire column yy.

You can use this vectorization to your advantage.

First compute the boolean mask at which you want to subtract:

df = pd.DataFrame({'month': ['September', 'October'], 'yy': [1, 1]})
mask = (df['month'] == 'October')

Since booleans happen to be exactly equal to the integers one and zero, there are a couple of ways you can perform the subtraction. You can do it directly:

df['yy2'] = df['yy'] - mask

The more "correct" way to do it would be to only subtract at the masked location. This relies on the fact that a boolean array can be used as a logical index, or mask, into the column:

df['yy2'] = df['yy']
df.loc[mask, 'yy2'] -= 1

Getting a column like df['yy2'] creates a copy that will not necessarily write back to the original dataframe if it is modified. df.loc[:, 'yy2'] is a view to the original column that will transmit changes.

Upvotes: 0

Silenced Temporarily
Silenced Temporarily

Reputation: 1004

First rule of pandas: if you're looping over rows, you're doing it wrong.

What you want to do is use apply:

df['yy2'] = df.apply(lambda x: x['yy'] - 1 if x['Month'] in ['October', 'November', 'December'] else x['yy'], axis=1)

Upvotes: 1

Related Questions