fellowCoder
fellowCoder

Reputation: 69

Add years to a date column based on condition in pandas

I have a dataframe with one of the column as date (DATECOL). My target is to increment the years by a "number" but the number is a variable and will be different. This number will be decided on the basis of the values of other columns.

For eg. in SQL this can be implemented as

SELECT CASE
    WHEN COL1 = "AB" THEN DATEADD(year, 2, DATECOL)
    WHEN COL1 = "XY" AND COL2 = "PQR" THEN DATEADD(year, 2, DATECOL)
    WHEN COL1 = "XY" AND COL2 != "PQR" THEN DATEADD(year, 3, DATECOL)
END AS NEWCOL
FROM DATAFRAME

Can someone please help me implement this kind of logic in pandas.

Upvotes: 1

Views: 1916

Answers (3)

marmurar
marmurar

Reputation: 116

What about using np.where with conditions as Akanksha Atrey mentioned?

np.where((first_condition) == True, add_yars, np.where(second_condition) == True, ...)

Upvotes: 0

Parfait
Parfait

Reputation: 107587

Consider numpy.select for multiple logical conditions and corresponding values:

conditions = [(df['COL1'] == 'AB'),
              (df['COL1'] == 'XY') & (df['COL2'] == 'PQR'),
              (df['COL1'] == 'XY') & (df['COL2'] != 'PQR')]

choices = [df['DATECOL'] + pd.DateOffset(years=2),
           df['DATECOL'] + pd.DateOffset(years=2),
           df['DATECOL'] + pd.DateOffset(years=3)]

df['NEWCOL'] = np.select(conditions, choices, default=np.datetime64('NaT'))

Upvotes: 2

Akanksha Atrey
Akanksha Atrey

Reputation: 870

You have to first ensure your DATECOL column is of type datetime. Then you can do this:

import pandas as pd

df = pd.DataFrame({'COL1': ['AB', 'XY', 'XY'], 'COL2': ['PQR', 'bla', 'PQR'], 'DATECOL': ['20101001', '20111001', '20121001']})

df['DATECOL'] = pd.to_datetime(df['DATECOL'], format='%Y%m%d') #change format as per your need

c1 = df['COL1']=='AB'
c2 = (df['COL1']=='XY') & (df['COL2']=='PQR')
c3 = (df['COL1']=='XY') & (df['COL2']!='PQR')

if len(df[c1]) > 0:
    df.loc[c1, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c2]) > 0:
    df.loc[c2, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c3]) > 0:
    df.loc[c3, 'DATECOL'] += pd.offsets.DateOffset(years=3)

Upvotes: 0

Related Questions