Damian Goh
Damian Goh

Reputation: 1

Slicing a datetimeindex dataframe

I'm new to python and pandas.

I'm trying to code something in which I compared the second last row of a pandas dataframe indexed by datetime with the last row of the dataframe. However, I am having trouble trying to get the index to compare properly.

For example, for a dataframe

dti = pd.date_range('2018-01-01', periods=10, freq='D')
df = pd.DataFrame(np.random.randn(10, 4), index=dti, columns=['value', 'value1', 'value2', 'value3'])

Produces a dataframe

               value    value1    value2    value3
2018-01-01  0.249018 -0.630937  1.120733  0.343331
2018-01-02 -0.347979  0.218776 -0.327967  0.569075
2018-01-03  0.879929  0.316554  0.159652 -0.872002
2018-01-04 -1.492683 -0.560168 -0.871728  1.739402
2018-01-05 -3.187776 -1.399635  0.873503 -0.098537
2018-01-06 -0.482267 -0.174304  0.821850  1.530861
2018-01-07  0.385600 -0.395019 -0.507524 -0.033996
2018-01-08  1.358576 -2.107309  0.956507 -0.360288
2018-01-09 -0.283369  0.630684  1.148554 -1.907604
2018-01-10  0.896697  0.030671  0.131533 -1.325552

I'm trying to compare

if df.loc[last_row, ['value']] > df.loc[second_last_row, ['value']]:
# Compare to see if 'value' in 2018-01-10 is higher than 'value' in 2018-01-09

I have tried using

length = len(df)
if if df.loc[length-1, ['value']] > df.loc[length-2, ['value']]:

but I get an error

TypeError: cannot do index indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [5] of <class 'int'>

It seems like the DatetimeIndex cannot me manipulated the same way as regular integer indices.

I have also considered using

df['value'].tail(1)

but I'm not sure how I would be able to access the second last row using the .tail method.

I need advice on how I would be able to retrieve data from the tail end of my dataframe based on its distance from the end of the dataframe. Thanks!

Upvotes: 0

Views: 1807

Answers (1)

jezrael
jezrael

Reputation: 862681

Use:

np.random.seed(2020)
dti = pd.date_range('2018-01-01', periods=10, freq='D')
df = pd.DataFrame(np.random.randn(10, 4), index=dti, 
                  columns=['value','value1','value2','value3'])
print (df)
               value    value1    value2    value3
2018-01-01 -1.768846  0.075552 -1.130630 -0.651430
2018-01-02 -0.893116 -1.274101 -0.061154  0.064514
2018-01-03  0.410113 -0.572882 -0.801334  1.312035
2018-01-04  1.274699 -1.214358  0.313719 -1.444821
2018-01-05 -0.368961 -0.769227  0.392616  0.057294
2018-01-06  2.089979  0.041971 -0.048341 -0.513154
2018-01-07 -0.084589 -1.215450 -1.412931 -1.486911
2018-01-08  0.382225  0.937673  1.772678  0.878828
2018-01-09  0.331719 -0.306036  1.240266 -0.215627
2018-01-10  0.155929  0.098056  0.832096  2.045205

If want select last value of column is possible use indexing of DatetimeIndex like:

print (df.loc[df.index[-1], 'value'])
0.15592948059188771

print (df.loc[df.index[-2], 'value'])
0.3317191223642817

df.loc[df.index[-1], 'value'] > df.loc[df.index[-2], 'value']

Your code should be changed by select by positions by DataFrame.iloc, only necessary pass position of column value by Index.get_loc:

length = len(df)
pos = df.columns.get_loc('value')

print (df.iloc[length-1, pos])
0.15592948059188771

print (df.iloc[length-2, pos])
0.3317191223642817

df.iloc[length-1, pos] > df.iloc[length-2, pos]

Upvotes: 1

Related Questions