Windstorm1981
Windstorm1981

Reputation: 2680

Python Pandas Select Index Value Referencing String Value in a Column

I have a dataframe which is date sequenced and has 'x' values in one column when there is new information on a particular date.

I want to get the index value of the row for the date before the most recent new information date so I can reference that data for further operations

So my dataframe looks like this:

original_df

index      date        value   newinfo
  0     '2007-12-01'     75      Nan
  1     '2007-12-02'     75      Nan
  2     '2007-12-03'     83       x
  3     '2007-12-04'     83      Nan
  4     '2007-12-05'     83      Nan
  5     '2007-12-06'     47       x
  6     '2007-12-07'     47      Nan
  7     '2007-12-08'     47      Nan
  8     '2007-12-09'     47      Nan

So I'm interested in referencing row where original_df.index == 4 for some further operations.

The only way I can think of doing it is very 'clunky'. Basically I create another dataframe by filtering my original for rows where newinfo == 'x', take the index value of the last row, subtract 1, and use that value to access various columns in that row of the original dataframe using iloc. Code looks like this:

interim_df = original_df[original_df['newinfo']=='x']
index_ref_value = interim_df.index[-1] - 1

This returns an index_ref_value of 4.

I can then access value in original_df as follows:

original_df.iloc[index_ref_value,1]

In other words, I'm access the value for 2007-12-05, the day before the most recent newinfo.

This gets the job done but strikes me as complicated and sloppy. Is there a cleaner, easier, more Pythonic way to find the index_ref_value I'm looking for?

Upvotes: 2

Views: 2652

Answers (1)

MattR
MattR

Reputation: 5136

you can combine iloc and loc into one statement:

original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index-1]

the loc statement is taking the index of where the condition (where newinfo is x) and then getting the index of that value. iloc then takes those indexes and givies you the result you are looking for

judging from your quesiton, you may need a list of these values in the futre. try df1.iloc[df1.loc[df1['newinfo'] == 'x'].index-1].index.tolist()

edit to get the desired output:

original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index[-1]-1]

# added a [0] at the end below to get just the value of `4`
original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index[-1]-1][0]

Upvotes: 3

Related Questions