Reputation: 972
I have a large dataframe with dates, store number, units sold, and rain precipitation totals. It looks like this...
date store_nbr units preciptotal
2014-10-11 1 0 0.00
2014-10-12 1 0 0.01
2014-10-13 1 2 0.00
2014-10-14 1 1 2.13
2014-10-15 1 0 0.00
2014-10-16 1 0 0.87
2014-10-17 1 3 0.01
2014-10-18 1 0 0.40
I want to select a three day window around any date that has a precipitation total greater than 1. For this small example, I would want get back the first 7 rows, the 3 days before 2014-10-14
, the three days after 2014-10-14
, and 2014-10-14
because it has a preciptotal greater than 1.
Upvotes: 2
Views: 194
Reputation: 879361
Here are two ways you could build the selection mask without looping over the index values:
You could find the rows where preciptotal
is greater than 1:
mask = (df['preciptotal'] > 1)
and then use scipy.ndimage.binary_dilation
to expand the mask to a 7-day window:
import scipy.ndimage as ndimage
import pandas as pd
df = df = pd.read_table('data', sep='\s+')
mask = (df['preciptotal'] > 1)
mask = ndimage.binary_dilation(mask, iterations=3)
df.loc[mask]
yields
date store_nbr units preciptotal
0 2014-10-11 1 0 0.00
1 2014-10-12 1 0 0.01
2 2014-10-13 1 2 0.00
3 2014-10-14 1 1 2.13
4 2014-10-15 1 0 0.00
5 2014-10-16 1 0 0.87
6 2014-10-17 1 3 0.01
Or, using NumPy (but without the scipy
dependency), you could use mask.shift
with np.logical_and.reduce
:
mask = (df['preciptotal'] > 1)
mask = ~np.logical_and.reduce([(~mask).shift(i) for i in range(-3, 4)]).astype(bool)
# array([ True, True, True, True, True, True, True, False], dtype=bool)
Upvotes: 3
Reputation: 393973
For a specific value you can do this:
In [84]:
idx = df[df['preciptotal'] > 1].index[0]
df.iloc[idx-3: idx+4]
Out[84]:
date store_nbr units preciptotal
0 2014-10-11 1 0 0.00
1 2014-10-12 1 0 0.01
2 2014-10-13 1 2 0.00
3 2014-10-14 1 1 2.13
4 2014-10-15 1 0 0.00
5 2014-10-16 1 0 0.87
6 2014-10-17 1 3 0.01
For the more general case you can get an array of indices where the condition is met
idx_vals = df[df['preciptotal'] > 1].index
then you can generate slices or iterate over the array values:
for idx in idx_values:
df.iloc[idx-3: idx+4]
This assumes your index is a 0 based int64 index which your sample is
Upvotes: 2