Champion
Champion

Reputation: 105

Dataframe: index values difference calculation

For a Python dataframe (based on some criteria) I am able to select the index value (a date) (='first date') as well as the index value (a date) corresponding to the very last row ('last date').

I would like to calculate explicitly the difference (in days) between 'first date' and 'last date' (should be = 3 (number of days)). How can I do it for this case?

Many thanks in advance!

import pandas as pd
df1 = pd.DataFrame({"date": ['2021-3-22', '2021-3-23', '2021-3-24', '2021-3-25', '2021-3-26'],
"x": ['1', 1, 'nan', 'nan', 'nan' ]})
df1.set_index('date', inplace=True)
df1

    
date         x  
2021-3-22   1
2021-3-23   1
2021-3-24   nan
2021-3-25   nan
2021-3-26   nan

print('first date:', df1.x[df1.x == 1].tail(1).index.values)
first date: ['2021-3-23']
(=d1)

print('last date:', df1.tail(1).index.values)
last date: ['2021-3-26']
(=d2)

d2-d1=?

Many thanks in advance!

Upvotes: 0

Views: 40

Answers (2)

alec_djinn
alec_djinn

Reputation: 10799

You forgot to set the type of the 'date' column. Currently, it is a string but you want it to be a datetime instance instead.

df1 = pd.DataFrame({"date": ['2021-3-22', '2021-3-23', '2021-3-24', '2021-3-25', '2021-3-26'],
"x": ['nan', 1, 'nan', 'nan', 'nan' ]})

df1['date'] = pd.to_datetime(df1['date'])

df1 = df1.set_index('date')

Now you can do:

df1.index[-1] - df1.index[0] #Timedelta('4 days 00:00:00')

or

(df1.index[-1] - df1.index[0]).days #4

Upvotes: 2

nilanjan_dk
nilanjan_dk

Reputation: 106

You need to convert the date's or index values in datetime format. You can either use datetime library or use pandas to do it. As your requirement you can do the following

(pd.to_datetime(df1.tail(1).index.values, format='%Y-%m-%d') 
 - pd.to_datetime(df1.x[df1.x == 1].tail(1).index.values, format='%Y-%m-%d')).days[0]

The pd.to_datetime will convert the string values in the required date format

Upvotes: 1

Related Questions