Reputation: 7583
I have an indexed pandas dataframe. By searching through its index, I find a row of interest. How do I find out the iloc of this row?
Example:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
A B C D
2000-01-01 -0.077564 0.310565 1.112333 1.023472
2000-01-02 -0.377221 -0.303613 -1.593735 1.354357
2000-01-03 1.023574 -0.139773 0.736999 1.417595
2000-01-04 -0.191934 0.319612 0.606402 0.392500
2000-01-05 -0.281087 -0.273864 0.154266 0.374022
2000-01-06 -1.953963 1.429507 1.730493 0.109981
2000-01-07 0.894756 -0.315175 -0.028260 -1.232693
2000-01-08 -0.032872 -0.237807 0.705088 0.978011
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row
Timestamp('2000-01-08 00:00:00', offset='D')
#which is the iloc of window_stop_row?
Upvotes: 46
Views: 59919
Reputation: 394399
Generally speaking, pass the named index value to index.get_loc
:
df.index.get_loc(row_of_interest_named_index)
Since you’re dealing with dates it may be more convenient to retrieve the index value with .name
:
In [131]:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
Out[131]:
A B C D
2000-01-01 0.095234 -1.000863 0.899732 -1.742152
2000-01-02 -0.517544 -1.274137 1.734024 -1.369487
2000-01-03 0.134112 1.964386 -0.120282 0.573676
2000-01-04 -0.737499 -0.581444 0.528500 -0.737697
2000-01-05 -1.777800 0.795093 0.120681 0.524045
2000-01-06 -0.048432 -0.751365 -0.760417 -0.181658
2000-01-07 -0.570800 0.248608 -1.428998 -0.662014
2000-01-08 -0.147326 0.717392 3.138620 1.208639
In [133]:
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row.name
Out[133]:
Timestamp('2000-01-03 00:00:00', offset='D')
In [134]:
df.index.get_loc(window_stop_row.name)
Out[134]:
2
get_loc
returns the ordinal position of the label in your index which is what you want:
In [135]:
df.iloc[df.index.get_loc(window_stop_row.name)]
Out[135]:
A 0.134112
B 1.964386
C -0.120282
D 0.573676
Name: 2000-01-03 00:00:00, dtype: float64
if you just want to search the index then so long as it is sorted then you can use searchsorted
:
In [142]:
df.index.searchsorted('2000-01-04') - 1
Out[142]:
2
Upvotes: 49
Reputation: 6233
While pandas.Index.get_loc()
will only work if you have a single key, the following paradigm will also work getting the iloc
of multiple elements:
np.argwhere(condition).flatten() # array of all iloc where condition is True
In your case, picking the latest element where df.index < '2000-01-04'
:
np.argwhere(df.index < '2000-01-04').flatten()[-1] # returns 2
Upvotes: 9
Reputation: 31692
IIUC you could call index for your case:
In [53]: df[df.index < '2000-01-04'].index[-1]
Out[53]: Timestamp('2000-01-03 00:00:00', offset='D')
EDIT
I think @EdChums answer is what you want. Alternatively you could filter your dataframe with values which you get, then use all
to find the row with that values and then pass it to the index
:
In [67]: df == window_stop_row
Out[67]:
A B C D
2000-01-01 False False False False
2000-01-02 False False False False
2000-01-03 True True True True
2000-01-04 False False False False
2000-01-05 False False False False
2000-01-06 False False False False
2000-01-07 False False False False
2000-01-08 False False False False
In [68]: (df == window_stop_row).all(axis=1)
Out[68]:
2000-01-01 False
2000-01-02 False
2000-01-03 True
2000-01-04 False
2000-01-05 False
2000-01-06 False
2000-01-07 False
2000-01-08 False
Freq: D, dtype: bool
In [69]: df.index[(df == window_stop_row).all(axis=1)]
Out[69]: DatetimeIndex(['2000-01-03'], dtype='datetime64[ns]', freq='D')
Upvotes: 1
Reputation: 432
You could try looping through each row in the dataframe:
for row_number,row in dataframe.iterrows():
if row['column_header'] == YourValue:
print row_number
This will give you the row with which to use the iloc function
Upvotes: -1