Brandon
Brandon

Reputation: 542

Finding row positions through datetime index in pandas dataframe

I have a dataframe that has a DateTimeIndex and an X 2d numpy array which is just the values of that dataframe. I want to select some rows through the index of the dataframe

dataset[from_d:to_d]

Where from_d and to_d are Timestamps. The dataset is sliced just fine but I need to get the positions of those rows inside the dataset. I need them because then I want to select the same rows in the X numpy array. Something like

X[indexes]

I have tried np.where(dataset[from_d:to_d])[0] but it gives me somehow a (23149590,) array when the dataset is of shape: (15075, 13117). Is there a better way to do this than with where?.

Upvotes: 0

Views: 1726

Answers (1)

jezrael
jezrael

Reputation: 863331

Use Index.get_indexer:

rng = pd.date_range('2017-04-03', periods=10)
dataset = pd.DataFrame({'a': range(10)}, index=rng)  
print (dataset)
            a
2017-04-03  0
2017-04-04  1
2017-04-05  2
2017-04-06  3
2017-04-07  4
2017-04-08  5
2017-04-09  6
2017-04-10  7
2017-04-11  8
2017-04-12  9

from_d = '2017-04-05'
to_d = '2017-04-10'
print (dataset[from_d:to_d])
            a
2017-04-05  2
2017-04-06  3
2017-04-07  4
2017-04-08  5
2017-04-09  6
2017-04-10  7

indexes = dataset.index.get_indexer(dataset[from_d:to_d].index)
print (indexes)
[2 3 4 5 6 7]

Or Index.searchsorted:

indexes = dataset.index.searchsorted(dataset[from_d:to_d].index)
print (indexes)
[2 3 4 5 6 7]

EDIT:

For not unique DatetimeIndex is it possible with unique index and Index.get_indexer_for:

rng = pd.date_range('2017-04-03', periods=10) 
dataset = pd.DataFrame({'a': range(20)}, index=rng.append(rng)).sort_index()
print (dataset)
             a
2017-04-03   0
2017-04-03  10
2017-04-04   1
2017-04-04  11
2017-04-05   2
2017-04-05  12
2017-04-06   3
2017-04-06  13
2017-04-07   4
2017-04-07  14
2017-04-08   5
2017-04-08  15
2017-04-09   6
2017-04-09  16
2017-04-10  17
2017-04-10   7
2017-04-11  18
2017-04-11   8
2017-04-12   9
2017-04-12  19

from_d = '2017-04-05'
to_d = '2017-04-10'

i = dataset[from_d:to_d].index.unique()
print (i)
DatetimeIndex(['2017-04-05', '2017-04-06', '2017-04-07', '2017-04-08',
               '2017-04-09', '2017-04-10'],
              dtype='datetime64[ns]', freq=None)

indexes = dataset.index.get_indexer_for(i)
print (indexes)
[ 4  5  6  7  8  9 10 11 12 13 14 15]

Verify indexes:

print (dataset.iloc[indexes])
             a
2017-04-05   2
2017-04-05  12
2017-04-06   3
2017-04-06  13
2017-04-07   4
2017-04-07  14
2017-04-08   5
2017-04-08  15
2017-04-09   6
2017-04-09  16
2017-04-10  17
2017-04-10   7

Upvotes: 2

Related Questions