Ladenkov Vladislav
Ladenkov Vladislav

Reputation: 1297

Applymap interface for operations on several(two) columns

Suppose i have a DataFrame:

df = pd.DataFrame({'DATE_1':['2010-11-06', '2010-10-07', '2010-09-07', '2010-05-07'],
                       'DATE_2':['2010-12-07', '2010-11-06', '2010-10-07', '2010-08-06']})
df['DATE_1'] = pd.to_datetime(df['DATE_1'])
df['DATE_2'] = pd.to_datetime(df['DATE_2'])

So it look like:

      DATE_1      DATE_2
0   2010-11-06  2010-12-07
1   2010-10-07  2010-11-06
2   2010-09-07  2010-10-07
3   2010-05-07  2010-08-06

I want to create another column DIFF which is diffrence of DATE_2 and DATE_1 in days or months or years.
I want to have an interface like the one, which is under these words, because i'll have to create a lot of columns, similar to DIFF from a lot of DATE_X columns:

def date_diffrence(x, y, parameter):
    if !np.isnan(x):
         return (x-y)
df['DIFF'] = df.apply(date_diffrence(df['DATE_2'], df['DATE_1']))

According to this post: Difference between map, applymap and apply methods in Pandas, it seems to me, that i'm not able to create such a universal interface. Am i right?

Upvotes: 2

Views: 376

Answers (1)

jezrael
jezrael

Reputation: 863176

It seems you need function without apply with Series (columns of df) as arguments with dt.days:

def date_diffrence_days(x, y):
    return (x-y).dt.days

df['DIFF'] = date_diffrence_days(df['DATE_2'], df['DATE_1'])
print (df)
      DATE_1     DATE_2  DIFF
0 2010-11-06 2010-12-07    31
1 2010-10-07 2010-11-06    30
2 2010-09-07 2010-10-07    30
3 2010-05-07 2010-08-06    91

What is same as:

df['DIFF'] = (df['DATE_2'] - df['DATE_1']).dt.days
print (df)
      DATE_1     DATE_2  DIFF
0 2010-11-06 2010-12-07    31
1 2010-10-07 2010-11-06    30
2 2010-09-07 2010-10-07    30
3 2010-05-07 2010-08-06    91

EDIT:

def date_diffrence_days(x, y, parameter):
    if parameter == 'm':
        return (x-y).dt.days
    elif parameter == 's':
        return (x-y).dt.total_seconds()

df['DIFF'] = date_diffrence_days(df['DATE_2'], df['DATE_1'], 's')
print (df)
      DATE_1     DATE_2       DIFF
0 2010-11-06 2010-12-07  2678400.0
1 2010-10-07 2010-11-06  2592000.0
2 2010-09-07 2010-10-07  2592000.0
3 2010-05-07 2010-08-06  7862400.0

Upvotes: 1

Related Questions