Reputation: 1294
The dataset below indicates which products are active for each customer, and in what month. Month can take several values (1,2,3,...etc), there are many products (x,y,z,etc), and product_active is binary to indicate active status.
cust_id month product product_active
1234 1 x 1
1234 2 x 0
1234 1 y 0
1234 2 y 1
How would I select all customers who have switched from product x to product y, from month 1 to month 2? I would like to generalise this, i.e. be able to select all customers who switched from product a to product b, from month m1 to month m2.
Upvotes: 1
Views: 78
Reputation: 8033
Check if this works for you
mask = df['product_active'].ne(df['product_active'].shift(2))
a = df[~mask & df.product_active==1]['cust_id']
Cust_not_switching = list(set(df[df['cust_id'].isin(a)]['cust_id'].to_list()))
Cust_switching = list(set(df[~df['cust_id'].isin(a)]['cust_id'].to_list()))
Input
cust_id month product product_active
0 1234 1 x 1
1 1234 2 x 0
2 1234 1 y 0
3 1234 2 y 1
4 1235 1 x 1
5 1235 2 x 1
6 1235 1 y 0
7 1235 2 y 0
8 1236 1 x 0
9 1236 2 x 1
10 1236 1 y 1
11 1236 2 y 0
Output
Cust_not_switching = [1235]
Cust_switching = [1234, 1236]
This will work only if there are 2 products.
Upvotes: 1
Reputation: 733
Ok there may be a more pythonic way to do this using a .groupby()
and vector solution, but here is a solution that will provide a df with the results you are looking for. I am making an assumption on your data that your product active column doesn't matter.
#DF Setup
_______________________
col = ['cust_id', 'month', 'product', 'product_active']
data = [
(1234, 1, 'x', 1 ),
(1234, 2, 'x', 0 ),
(1235, 1, 'y', 0 ),
(1235, 2, 'y', 1 ),
(1236, 1, 'x', 1 ),
(1236, 2, 'y', 0 )]
df = pd.DataFrame(data, columns=col)
Added an additional customer (1236) to simulate product change (x->y) from m1 to m2.
#Solution
______________________
result_df = pd.DataFrame()
for i,row in df.iterrows():
if i == 0:
pass
elif df.loc[i-1,'cust_id'] == df.loc[i,'cust_id']:
if (df.loc[i-1,'month'] == 1) & (df.loc[i,'month'] == 2):
if (df.loc[i-1,'product'] == 'x') & (df.loc[i,'product'] == 'y'):
result_df = result_df.append(df.loc[i])
Here is the generic solution wrapped in a function:
def filter_function(month,p1,p2):
'''
month - month you wish to check for product change.
p1 - "From" product
p2 - "To" product
'''
result_df = pd.DataFrame()
for i,row in df.iterrows():
if i == 0:
pass
elif df.loc[i-1,'cust_id'] == df.loc[i,'cust_id']:
if (df.loc[i-1,'month'] == month-1) & (df.loc[i,'month'] == month):
if (df.loc[i-1,'product'] == p1) & (df.loc[i,'product'] == p2):
result_df = result_df.append(df.loc[i])
return result_df
filter_function(2,'x','y')
Upvotes: 1