Reputation: 1499
I have a similar problem to the one here (dataframe by index and by integer)
What I want is to get part of the DataFrame by a boolean indexing (easy) and look at a few values backward, say at the previous index and possibly a few more. Unfortunately, the suggested answer in the linked question with get_loc
makes my code snippet choke (type error in the following snippet) before I can get the actual integer locations.
Taking the same example as in the answer in the other question, here is what I tried:
df = pd.DataFrame(index=pd.date_range(start=dt.datetime(2015,1,1), end=dt.datetime(2015,2,1)), data={'a':np.arange(32)})
df.index.get_loc(df.index[df['a'] == 1])
*** TypeError: Cannot convert input to TimeStamp
The previous answer used a string for get_loc
where I'd just like to pass a plain index value (here a DateTime)
Upvotes: 1
Views: 3117
Reputation: 880717
Using a slice:
import numpy as np
import pandas as pd
import datetime as DT
index = pd.date_range(start=DT.datetime(2015,1,1), end=DT.datetime(2015,2,1))
df = pd.DataFrame({'a':np.arange(len(index))}, index=index)
mask = df['a'] == 1
idx = np.flatnonzero(mask)[0]
lookback = 3
print(df.iloc[max(idx-lookback, 0):idx+1])
yields
a
2015-01-08 7
2015-01-09 8
2015-01-10 9
2015-01-11 10
Note that if idx-lookback
is negative, then the index refers to elements near the tail of df
, just like with Python lists:
In [163]: df.iloc[-3:2]
Out[163]:
Empty DataFrame
Columns: [a]
Index: []
In [164]: df.iloc[0:2]
Out[164]:
a
2015-01-01 0
2015-01-02 1
Thus, to grab elements relative to the head of df
, use max(idx-lookback, 0)
.
Using a boolean mask:
As you know, if you have a boolean array or boolean Series such as
mask = df['a'] == 10
you can select the corresponding rows with
df.loc[mask]
If you wish to select previous or succeeding rows shifted by a fixed amount, you could use mask.shift
to shift the mask:
df.loc[mask.shift(-lookback).fillna(False)]
If you wish to select lookback
preceeding rows, then you could expand the mask by unioning it with its shifts:
lookback = 3
for i in range(1, lookback):
mask |= mask.shift(-i)
or, equivalently, use cumsum
:
mask = (mask.shift(-lookback) - mask.shift(1)).cumsum().fillna(False).astype(bool)
The for-loop
is clearer, but the cumsum expression is faster, particularly if lookback
is large.
For example,
import numpy as np
import pandas as pd
import datetime as DT
df = pd.DataFrame(
index=pd.date_range(start=DT.datetime(2015,1,1), end=DT.datetime(2015,2,1)),
data={'a':np.arange(32)})
mask = df['a'] == 10
lookback = 3
for i in range(1, lookback):
mask |= mask.shift(-i)
# alternatively,
# mask = (mask.shift(-lookback) - mask.shift(1)).cumsum().fillna(False).astype(bool)
print(df.loc[mask])
yields
a
2015-01-08 7
2015-01-09 8
2015-01-10 9
2015-01-11 10
Upvotes: 2