BhishanPoudel
BhishanPoudel

Reputation: 17144

How to get the column name of last non zero value in pandas?

I have data with some sequential features which must come before other feature comes. I would like to get the final page visited by the user.

Setup

import numpy as np
import pandas as pd


df = pd.DataFrame({'user': [10,15,17],
                  'sex': ['M','M','F'],
                  'home_page': [1,1,1],
                  'search_page': [1,0,1],
                  'confirmation_page': [1,0,0],
                  'payment_page':[1,0,0]})
print(df)

   user sex  home_page  search_page  confirmation_page  payment_page
0    10   M          1            1                  1             1
1    15   M          1            0                  0             0
2    17   F          1            1                  0             0

Question

How to get new column with name 'final_page' that have the name of final page visited.

Required answer

df['final_page'] = ['payment_page','home_page','search_page'] # this is not answer,
# The new column should have these values.

my attempt

a = df.iloc[:,2:].to_numpy()
np.trim_zeros(a)

realted links

Find last non-zero element's index in pandas series

Upvotes: 0

Views: 275

Answers (2)

anky
anky

Reputation: 75080

You can use a dot product of columns with condition where df!=0 and then split and get the last column:

m=df.set_index(['user','sex'],append=True)
df['final_page']=(m.ne(0).dot(m.columns+ ',').str.rstrip(',').str.split(',')
                                     .str[-1].droplevel(['user','sex']))
print(df)

Or:

df['final_page']=m.apply(pd.Series.last_valid_index,axis=1).reset_index(drop=True)

   user sex  home_page  search_page  confirmation_page  payment_page  \
0    10   M          1            1                  1             1   
1    15   M          1            0                  0             0   
2    17   F          1            1                  0             0   

     final_page  
0  payment_page  
1     home_page  
2   search_page  

Upvotes: 1

BhishanPoudel
BhishanPoudel

Reputation: 17144

Using numpy:

import numpy as np
import pandas as pd

df = pd.DataFrame({'user': [10,15,17],
                  'sex': ['M','M','F'],
                  'home_page': [1,1,1],
                  'search_page': [1,0,1],
                  'confirmation_page': [1,0,0],
                  'payment_page':[1,0,0]})
pages = df.columns[2:]

df['final_page'] = df.iloc[:,2:].apply(lambda x: pages[np.max(np.nonzero(x))],axis=1)

print(df)

Result:

   user sex  home_page  search_page  confirmation_page  payment_page  \
0    10   M          1            1                  1             1   
1    15   M          1            0                  0             0   
2    17   F          1            1                  0             0   

     final_page  
0  payment_page  
1     home_page  
2   search_page 

Upvotes: 0

Related Questions