SomeGuy30145
SomeGuy30145

Reputation: 85

Index Value of Last Matching Row Python Panda DataFrame

I have a dataframe which has a value of either 0 or 1 in a "column 2", and either a 0 or 1 in "column 1", I would somehow like to find and append as a column the index value for the last row where Column1 = 1 but only for rows where column 2 = 1. This might be easier to see than read:

d = {'C1' : pd.Series([1, 0, 1,0,0], index=[1,2,3,4,5]),'C2' : pd.Series([0, 0,0,1,1], index=[1,2,3,4,5])}
df = pd.DataFrame(d)
print(df)

   C1  C2 
1   1   0   
2   0   0   
3   1   0   
4   0   1  
5   0   1

#I've left out my attempts as they don't even get close
df['C3'] = IF C2 = 1: Call Function that gives Index Value of last place where C1 = 1 Else 0 End 

This would result in this result set:

   C1  C2  C3
1   1   0   0
2   0   0   0
3   1   0   0
4   0   1   3
5   0   1   3

I was trying to get a function to do this as there are roughly 2million rows in my data set but only ~10k where C2 =1.

Thank you in advance for any help, I really appreciate it - I only started programming with python a few weeks ago.

Upvotes: 1

Views: 975

Answers (1)

firelynx
firelynx

Reputation: 32214

It is not so straight forward, you have to do a few loops to get this result. The key here is the fillna method which can do forwards and backwards filling.

It is often the case that pandas methods does more than one thing, this makes it very hard to figure out what methods to use for what.

So let me talk you through this code.

First we need to set C3 to nan, otherwise we cannot use fillna later.

Then we set C3 to be the index but only where C1 == 1 (the mask does this)

After this we can use fillna with method='ffill' to propagate the last observation forwards.

Then we have to mask away all the values where C2 == 0, same way we set the index earlier, with a mask.

df['C3'] = pd.np.nan
mask = df['C1'] == 1
df['C3'].loc[mask] = df.index[mask].copy()
df['C3'] = df['C3'].fillna(method='ffill')
mask = df['C2'] == 0
df['C3'].loc[mask] = 0
df

    C1  C2  C3
1   1   0   0
2   0   0   0
3   1   0   0
4   0   1   3
5   0   1   3

EDIT:

Added a .copy() to the index, otherwise we overwrite it and the index gets all full of zeroes.

Upvotes: 2

Related Questions