Reputation: 1
I'm new to python and pandas.
I'm trying to code something in which I compared the second last row of a pandas dataframe indexed by datetime with the last row of the dataframe. However, I am having trouble trying to get the index to compare properly.
For example, for a dataframe
dti = pd.date_range('2018-01-01', periods=10, freq='D')
df = pd.DataFrame(np.random.randn(10, 4), index=dti, columns=['value', 'value1', 'value2', 'value3'])
Produces a dataframe
value value1 value2 value3
2018-01-01 0.249018 -0.630937 1.120733 0.343331
2018-01-02 -0.347979 0.218776 -0.327967 0.569075
2018-01-03 0.879929 0.316554 0.159652 -0.872002
2018-01-04 -1.492683 -0.560168 -0.871728 1.739402
2018-01-05 -3.187776 -1.399635 0.873503 -0.098537
2018-01-06 -0.482267 -0.174304 0.821850 1.530861
2018-01-07 0.385600 -0.395019 -0.507524 -0.033996
2018-01-08 1.358576 -2.107309 0.956507 -0.360288
2018-01-09 -0.283369 0.630684 1.148554 -1.907604
2018-01-10 0.896697 0.030671 0.131533 -1.325552
I'm trying to compare
if df.loc[last_row, ['value']] > df.loc[second_last_row, ['value']]:
# Compare to see if 'value' in 2018-01-10 is higher than 'value' in 2018-01-09
I have tried using
length = len(df)
if if df.loc[length-1, ['value']] > df.loc[length-2, ['value']]:
but I get an error
TypeError: cannot do index indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [5] of <class 'int'>
It seems like the DatetimeIndex cannot me manipulated the same way as regular integer indices.
I have also considered using
df['value'].tail(1)
but I'm not sure how I would be able to access the second last row using the .tail method.
I need advice on how I would be able to retrieve data from the tail end of my dataframe based on its distance from the end of the dataframe. Thanks!
Upvotes: 0
Views: 1807
Reputation: 862681
Use:
np.random.seed(2020)
dti = pd.date_range('2018-01-01', periods=10, freq='D')
df = pd.DataFrame(np.random.randn(10, 4), index=dti,
columns=['value','value1','value2','value3'])
print (df)
value value1 value2 value3
2018-01-01 -1.768846 0.075552 -1.130630 -0.651430
2018-01-02 -0.893116 -1.274101 -0.061154 0.064514
2018-01-03 0.410113 -0.572882 -0.801334 1.312035
2018-01-04 1.274699 -1.214358 0.313719 -1.444821
2018-01-05 -0.368961 -0.769227 0.392616 0.057294
2018-01-06 2.089979 0.041971 -0.048341 -0.513154
2018-01-07 -0.084589 -1.215450 -1.412931 -1.486911
2018-01-08 0.382225 0.937673 1.772678 0.878828
2018-01-09 0.331719 -0.306036 1.240266 -0.215627
2018-01-10 0.155929 0.098056 0.832096 2.045205
If want select last value of column is possible use indexing of DatetimeIndex
like:
print (df.loc[df.index[-1], 'value'])
0.15592948059188771
print (df.loc[df.index[-2], 'value'])
0.3317191223642817
df.loc[df.index[-1], 'value'] > df.loc[df.index[-2], 'value']
Your code should be changed by select by positions by DataFrame.iloc
, only necessary pass position of column value
by Index.get_loc
:
length = len(df)
pos = df.columns.get_loc('value')
print (df.iloc[length-1, pos])
0.15592948059188771
print (df.iloc[length-2, pos])
0.3317191223642817
df.iloc[length-1, pos] > df.iloc[length-2, pos]
Upvotes: 1