Hound
Hound

Reputation: 972

Slicing based on dates Pandas Dataframe

I have a large dataframe with dates, store number, units sold, and rain precipitation totals. It looks like this...

  date        store_nbr      units            preciptotal
  2014-10-11       1          0                0.00
  2014-10-12       1          0                0.01
  2014-10-13       1          2                0.00
  2014-10-14       1          1                2.13
  2014-10-15       1          0                0.00
  2014-10-16       1          0                0.87
  2014-10-17       1          3                0.01
  2014-10-18       1          0                0.40

I want to select a three day window around any date that has a precipitation total greater than 1. For this small example, I would want get back the first 7 rows, the 3 days before 2014-10-14, the three days after 2014-10-14, and 2014-10-14 because it has a preciptotal greater than 1.

Upvotes: 2

Views: 194

Answers (2)

unutbu
unutbu

Reputation: 879361

Here are two ways you could build the selection mask without looping over the index values:

You could find the rows where preciptotal is greater than 1:

mask = (df['preciptotal'] > 1)

and then use scipy.ndimage.binary_dilation to expand the mask to a 7-day window:

import scipy.ndimage as ndimage
import pandas as pd

df = df = pd.read_table('data', sep='\s+')

mask = (df['preciptotal'] > 1)
mask = ndimage.binary_dilation(mask, iterations=3)

df.loc[mask]

yields

         date  store_nbr  units  preciptotal
0  2014-10-11          1      0         0.00
1  2014-10-12          1      0         0.01
2  2014-10-13          1      2         0.00
3  2014-10-14          1      1         2.13
4  2014-10-15          1      0         0.00
5  2014-10-16          1      0         0.87
6  2014-10-17          1      3         0.01

Or, using NumPy (but without the scipy dependency), you could use mask.shift with np.logical_and.reduce:

mask = (df['preciptotal'] > 1)
mask = ~np.logical_and.reduce([(~mask).shift(i) for i in range(-3, 4)]).astype(bool)
# array([ True,  True,  True,  True,  True,  True,  True, False], dtype=bool)

Upvotes: 3

EdChum
EdChum

Reputation: 393973

For a specific value you can do this:

In [84]:

idx = df[df['preciptotal'] > 1].index[0]
df.iloc[idx-3: idx+4]
Out[84]:
        date  store_nbr  units  preciptotal
0 2014-10-11          1      0         0.00
1 2014-10-12          1      0         0.01
2 2014-10-13          1      2         0.00
3 2014-10-14          1      1         2.13
4 2014-10-15          1      0         0.00
5 2014-10-16          1      0         0.87
6 2014-10-17          1      3         0.01

For the more general case you can get an array of indices where the condition is met

idx_vals = df[df['preciptotal'] > 1].index

then you can generate slices or iterate over the array values:

for idx in idx_values:
    df.iloc[idx-3: idx+4]

This assumes your index is a 0 based int64 index which your sample is

Upvotes: 2

Related Questions