Reputation: 542
I have a dataframe that has a DateTimeIndex and an X 2d numpy array which is just the values of that dataframe. I want to select some rows through the index of the dataframe
dataset[from_d:to_d]
Where from_d and to_d are Timestamps. The dataset is sliced just fine but I need to get the positions of those rows inside the dataset. I need them because then I want to select the same rows in the X numpy array. Something like
X[indexes]
I have tried np.where(dataset[from_d:to_d])[0]
but it gives me somehow a (23149590,) array when the dataset is of shape: (15075, 13117). Is there a better way to do this than with where?.
Upvotes: 0
Views: 1726
Reputation: 863331
Use Index.get_indexer
:
rng = pd.date_range('2017-04-03', periods=10)
dataset = pd.DataFrame({'a': range(10)}, index=rng)
print (dataset)
a
2017-04-03 0
2017-04-04 1
2017-04-05 2
2017-04-06 3
2017-04-07 4
2017-04-08 5
2017-04-09 6
2017-04-10 7
2017-04-11 8
2017-04-12 9
from_d = '2017-04-05'
to_d = '2017-04-10'
print (dataset[from_d:to_d])
a
2017-04-05 2
2017-04-06 3
2017-04-07 4
2017-04-08 5
2017-04-09 6
2017-04-10 7
indexes = dataset.index.get_indexer(dataset[from_d:to_d].index)
print (indexes)
[2 3 4 5 6 7]
indexes = dataset.index.searchsorted(dataset[from_d:to_d].index)
print (indexes)
[2 3 4 5 6 7]
EDIT:
For not unique DatetimeIndex is it possible with unique index and Index.get_indexer_for
:
rng = pd.date_range('2017-04-03', periods=10)
dataset = pd.DataFrame({'a': range(20)}, index=rng.append(rng)).sort_index()
print (dataset)
a
2017-04-03 0
2017-04-03 10
2017-04-04 1
2017-04-04 11
2017-04-05 2
2017-04-05 12
2017-04-06 3
2017-04-06 13
2017-04-07 4
2017-04-07 14
2017-04-08 5
2017-04-08 15
2017-04-09 6
2017-04-09 16
2017-04-10 17
2017-04-10 7
2017-04-11 18
2017-04-11 8
2017-04-12 9
2017-04-12 19
from_d = '2017-04-05'
to_d = '2017-04-10'
i = dataset[from_d:to_d].index.unique()
print (i)
DatetimeIndex(['2017-04-05', '2017-04-06', '2017-04-07', '2017-04-08',
'2017-04-09', '2017-04-10'],
dtype='datetime64[ns]', freq=None)
indexes = dataset.index.get_indexer_for(i)
print (indexes)
[ 4 5 6 7 8 9 10 11 12 13 14 15]
Verify indexes:
print (dataset.iloc[indexes])
a
2017-04-05 2
2017-04-05 12
2017-04-06 3
2017-04-06 13
2017-04-07 4
2017-04-07 14
2017-04-08 5
2017-04-08 15
2017-04-09 6
2017-04-09 16
2017-04-10 17
2017-04-10 7
Upvotes: 2