Reputation: 17144
I have data with some sequential features which must come before other feature comes. I would like to get the final page visited by the user.
import numpy as np
import pandas as pd
df = pd.DataFrame({'user': [10,15,17],
'sex': ['M','M','F'],
'home_page': [1,1,1],
'search_page': [1,0,1],
'confirmation_page': [1,0,0],
'payment_page':[1,0,0]})
print(df)
user sex home_page search_page confirmation_page payment_page
0 10 M 1 1 1 1
1 15 M 1 0 0 0
2 17 F 1 1 0 0
How to get new column with name 'final_page' that have the name of final page visited.
Required answer
df['final_page'] = ['payment_page','home_page','search_page'] # this is not answer,
# The new column should have these values.
a = df.iloc[:,2:].to_numpy()
np.trim_zeros(a)
Find last non-zero element's index in pandas series
Upvotes: 0
Views: 275
Reputation: 75080
You can use a dot
product of columns with condition where df!=0
and then split and get the last column:
m=df.set_index(['user','sex'],append=True)
df['final_page']=(m.ne(0).dot(m.columns+ ',').str.rstrip(',').str.split(',')
.str[-1].droplevel(['user','sex']))
print(df)
Or:
df['final_page']=m.apply(pd.Series.last_valid_index,axis=1).reset_index(drop=True)
user sex home_page search_page confirmation_page payment_page \
0 10 M 1 1 1 1
1 15 M 1 0 0 0
2 17 F 1 1 0 0
final_page
0 payment_page
1 home_page
2 search_page
Upvotes: 1
Reputation: 17144
Using numpy:
import numpy as np
import pandas as pd
df = pd.DataFrame({'user': [10,15,17],
'sex': ['M','M','F'],
'home_page': [1,1,1],
'search_page': [1,0,1],
'confirmation_page': [1,0,0],
'payment_page':[1,0,0]})
pages = df.columns[2:]
df['final_page'] = df.iloc[:,2:].apply(lambda x: pages[np.max(np.nonzero(x))],axis=1)
print(df)
Result:
user sex home_page search_page confirmation_page payment_page \
0 10 M 1 1 1 1
1 15 M 1 0 0 0
2 17 F 1 1 0 0
final_page
0 payment_page
1 home_page
2 search_page
Upvotes: 0