Reputation: 69
I have a dataframe with one of the column as date (DATECOL). My target is to increment the years by a "number" but the number is a variable and will be different. This number will be decided on the basis of the values of other columns.
For eg. in SQL this can be implemented as
SELECT CASE
WHEN COL1 = "AB" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 = "PQR" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 != "PQR" THEN DATEADD(year, 3, DATECOL)
END AS NEWCOL
FROM DATAFRAME
Can someone please help me implement this kind of logic in pandas.
Upvotes: 1
Views: 1916
Reputation: 116
What about using np.where with conditions as Akanksha Atrey mentioned?
np.where((first_condition) == True, add_yars, np.where(second_condition) == True, ...)
Upvotes: 0
Reputation: 107587
Consider numpy.select
for multiple logical conditions and corresponding values:
conditions = [(df['COL1'] == 'AB'),
(df['COL1'] == 'XY') & (df['COL2'] == 'PQR'),
(df['COL1'] == 'XY') & (df['COL2'] != 'PQR')]
choices = [df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=3)]
df['NEWCOL'] = np.select(conditions, choices, default=np.datetime64('NaT'))
Upvotes: 2
Reputation: 870
You have to first ensure your DATECOL
column is of type datetime. Then you can do this:
import pandas as pd
df = pd.DataFrame({'COL1': ['AB', 'XY', 'XY'], 'COL2': ['PQR', 'bla', 'PQR'], 'DATECOL': ['20101001', '20111001', '20121001']})
df['DATECOL'] = pd.to_datetime(df['DATECOL'], format='%Y%m%d') #change format as per your need
c1 = df['COL1']=='AB'
c2 = (df['COL1']=='XY') & (df['COL2']=='PQR')
c3 = (df['COL1']=='XY') & (df['COL2']!='PQR')
if len(df[c1]) > 0:
df.loc[c1, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c2]) > 0:
df.loc[c2, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c3]) > 0:
df.loc[c3, 'DATECOL'] += pd.offsets.DateOffset(years=3)
Upvotes: 0