Programmer
Programmer

Reputation: 1294

Complex selecting in pandas dataframe

The dataset below indicates which products are active for each customer, and in what month. Month can take several values (1,2,3,...etc), there are many products (x,y,z,etc), and product_active is binary to indicate active status.

cust_id month product  product_active
1234    1     x        1 
1234    2     x        0 
1234    1     y        0 
1234    2     y        1   

How would I select all customers who have switched from product x to product y, from month 1 to month 2? I would like to generalise this, i.e. be able to select all customers who switched from product a to product b, from month m1 to month m2.

Upvotes: 1

Views: 78

Answers (2)

moys
moys

Reputation: 8033

Check if this works for you

mask = df['product_active'].ne(df['product_active'].shift(2))
a = df[~mask & df.product_active==1]['cust_id']
Cust_not_switching = list(set(df[df['cust_id'].isin(a)]['cust_id'].to_list()))
Cust_switching = list(set(df[~df['cust_id'].isin(a)]['cust_id'].to_list()))

Input

cust_id     month   product     product_active
0   1234    1   x   1
1   1234    2   x   0
2   1234    1   y   0
3   1234    2   y   1
4   1235    1   x   1
5   1235    2   x   1
6   1235    1   y   0
7   1235    2   y   0
8   1236    1   x   0
9   1236    2   x   1
10  1236    1   y   1
11  1236    2   y   0

Output

Cust_not_switching = [1235]
Cust_switching = [1234, 1236]

This will work only if there are 2 products.

Upvotes: 1

ParalysisByAnalysis
ParalysisByAnalysis

Reputation: 733

Ok there may be a more pythonic way to do this using a .groupby() and vector solution, but here is a solution that will provide a df with the results you are looking for. I am making an assumption on your data that your product active column doesn't matter.

#DF Setup
_______________________
col = ['cust_id', 'month', 'product',  'product_active']
data = [
(1234,    1,     'x',        1 ),
(1234,    2,     'x',        0 ),
(1235,    1,     'y',        0 ),
(1235,    2,     'y',        1 ),
(1236,    1,     'x',        1 ),
(1236,    2,     'y',        0 )]
df = pd.DataFrame(data, columns=col)

Added an additional customer (1236) to simulate product change (x->y) from m1 to m2.

#Solution
______________________
result_df = pd.DataFrame()

for i,row in df.iterrows():
    if i == 0:
        pass
    elif df.loc[i-1,'cust_id'] == df.loc[i,'cust_id']:
        if (df.loc[i-1,'month'] == 1) & (df.loc[i,'month'] == 2):
            if (df.loc[i-1,'product'] == 'x') & (df.loc[i,'product'] == 'y'):
                result_df = result_df.append(df.loc[i])

Here is the generic solution wrapped in a function:

def filter_function(month,p1,p2):
    '''
    month - month you wish to check for product change.
    p1 - "From" product
    p2 - "To" product
    '''
    result_df = pd.DataFrame()

    for i,row in df.iterrows():
        if i == 0:
            pass
        elif df.loc[i-1,'cust_id'] == df.loc[i,'cust_id']:
            if (df.loc[i-1,'month'] == month-1) & (df.loc[i,'month'] == month):
                if (df.loc[i-1,'product'] == p1) & (df.loc[i,'product'] == p2):
                    result_df = result_df.append(df.loc[i])
    return result_df

filter_function(2,'x','y')

Upvotes: 1

Related Questions