Gingerbread
Gingerbread

Reputation: 2122

NaNs when extracting no. of days between two dates in pandas

I have a dataframe that contains the columns company_id, seniority, join_date and quit_date. I am trying to extract the number of days between join date and quit date. However, I get NaNs.

If I drop off all the columns in the dataframe except for quit date and join date and run the same code again, I get what I expect. However with all the columns, I get NaNs.

Here's my code:

df['join_date'] = pd.to_datetime(df['join_date'])
df['quit_date'] = pd.to_datetime(df['quit_date'])
df['days'] = df['quit_date'] - df['join_date']
df['days'] = df['days'].astype(str)
df1 = pd.DataFrame(df.days.str.split(' ').tolist(), columns = ['days', 'unwanted', 'stamp'])
df['numberdays'] = df1['days']

This is what I get:

days                  numberdays
585 days 00:00:00        NaN
340 days 00:00:00        NaN

I want 585 from the 'days' column in the 'numberdays' column. Similarly for every such row.

Can someone help me with this?

Thank you!

Upvotes: 2

Views: 1054

Answers (1)

3novak
3novak

Reputation: 2544

Instead of converting to string, extract the number of days from the timedelta value using the dt accessor.

import pandas as pd

df = pd.DataFrame({'join_date': ['2014-03-24', '2013-04-29', '2014-10-13'],
                   'quit_date':['2015-10-30', '2014-04-04', '']})
df['join_date'] = pd.to_datetime(df['join_date'])
df['quit_date'] = pd.to_datetime(df['quit_date'])

df['days'] = df['quit_date'] - df['join_date']
df['number_of_days'] = df['days'].dt.days

@Mohammad Yusuf Ghazi points out that dt.day is necessary to get the number of days instead of dt.days when working with datetime data rather than timedelta.

Upvotes: 3

Related Questions