Reputation: 2680
I have a dataframe which is date sequenced and has 'x' values in one column when there is new information on a particular date.
I want to get the index value of the row for the date before the most recent new information date so I can reference that data for further operations
So my dataframe looks like this:
original_df
index date value newinfo
0 '2007-12-01' 75 Nan
1 '2007-12-02' 75 Nan
2 '2007-12-03' 83 x
3 '2007-12-04' 83 Nan
4 '2007-12-05' 83 Nan
5 '2007-12-06' 47 x
6 '2007-12-07' 47 Nan
7 '2007-12-08' 47 Nan
8 '2007-12-09' 47 Nan
So I'm interested in referencing row where original_df.index == 4
for some further operations.
The only way I can think of doing it is very 'clunky'. Basically I create another dataframe by filtering my original for rows where newinfo == 'x'
, take the index value of the last row, subtract 1, and use that value to access various columns in that row of the original dataframe using iloc
. Code looks like this:
interim_df = original_df[original_df['newinfo']=='x']
index_ref_value = interim_df.index[-1] - 1
This returns an index_ref_value
of 4
.
I can then access value
in original_df
as follows:
original_df.iloc[index_ref_value,1]
In other words, I'm access the value
for 2007-12-05
, the day before the most recent newinfo
.
This gets the job done but strikes me as complicated and sloppy. Is there a cleaner, easier, more Pythonic way to find the index_ref_value I'm looking for?
Upvotes: 2
Views: 2652
Reputation: 5136
you can combine iloc
and loc
into one statement:
original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index-1]
the loc
statement is taking the index of where the condition (where newinfo
is x
) and then getting the index of that value. iloc
then takes those indexes and givies you the result you are looking for
judging from your quesiton, you may need a list of these values in the futre. try df1.iloc[df1.loc[df1['newinfo'] == 'x'].index-1].index.tolist()
edit to get the desired output:
original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index[-1]-1]
# added a [0] at the end below to get just the value of `4`
original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index[-1]-1][0]
Upvotes: 3