daiyue
daiyue

Reputation: 7458

pandas AttributeError: 'DataFrame' object has no attribute 'dt' when using apply on groupby

I have the following df,

code    date1        date2
2000    2018-03-21   2018-04-04
2000    2018-03-22   2018-04-05
2000    2018-03-23   2018-04-06

When I tried

df_code_grp_by = df.groupby(['code'])

df_code_grp_by.apply(lambda x: x.date2 - x.date1).dt.days.sum(level=0).reset_index(name='date_diff_sum')

I got

AttributeError: 'DataFrame' object has no attribute 'dt'

date1 and date2 are both dtype('<M8[ns]'), I am wondering how to fix it.

I am using Pandas 0.22.0, Python 3.5.2 and Numpy 1.15.4.

Upvotes: 4

Views: 7278

Answers (1)

jezrael
jezrael

Reputation: 863791

Better here is create index by code column and subtract Series:

df = df.set_index('code')
df = (df.date2 - df.date1).dt.days.sum(level=0).reset_index(name='date_diff_sum')
print (df)
   code  date_diff_sum
0  2000             42

Problem of code is apply return rows (maybe bug):

df_code_grp_by = df.groupby(['code'])

df = df_code_grp_by.apply(lambda x: x.date2 - x.date1)
print (df)
                     0                 1                 2
code                                                      
2000  1209600000000000  1209600000000000  1209600000000000

Possible solution is use np.sum:

df = (df_code_grp_by.apply(lambda x: np.sum(x.date2 - x.date1))
                    .dt.days
                    .reset_index(name='date_diff_sum'))
print (df)
   code  date_diff_sum
0  2000             42

Upvotes: 2

Related Questions