Yogesh Govindan
Yogesh Govindan

Reputation: 381

How to use multiple conditions in Pandas

I am trying to create a new variable basis on certain condition. In the given example, i want to create a new condition such that if the recency is greater than the mean of recency and the frequency is greater than 3 times its standard deviation, i will assign 2 else 0. Below are the codes and its error:

import pandas as pd
import numpy as np
  
# intialise data of lists.
data = {'cust':["a1",   "a2",   "a3",   "a4",   "a5",   "a6",   "a7",   "a8",   "a9",   "a10",  "a11",  "a12",  "a13",  "a14",  "a15",  "a16",  "a17",  "a18",  "a19",  "a20",  "a21",  "a22",  "a23",  "a24",  "a25",  "a26",  "a27",  "a28",  "a29",  "a30",  "a31",  "a32",  "a33",  "a34",  "a35",  "a36",  "a37",  "a38",  "a39",  "a40",  "a41",  "a42",  "a43",  "a44",  "a45",  "a46",  "a47",  "a48",  "a49",  "a50",  "a51"],
        'recency':[3,   7,  9,  9,  6,  8,  3,  9,  6,  5,  8,  6,  2,  8,  3,  3,  2,  7,  3,  1,  7,  6,  10, 6,  2,  8,  6,  10, 2,  7,  9,  1,  1,  3,  6,  4,  6,  4,  6,  6,  7,  3,  7,  9,  6,  4,  7,  3,  1,  9,  3],
        'frequency':[15,    9,  13, 9,  19, 1,  11, 20, 20, 15, 15, 18, 1,  9,  20, 14, 11, 11, 4,  15, 1,  8,  17, 19, 13, 20, 1,  11, 3,  8,  2,  4,  15, 5,  12, 15, 20, 6,  19, 2,  6,  12, 6,  6,  4,  7,  2,  3,  20, 13, 11],
       'monetary':[8854,    5614,   2687,   3553,   1801,   1076,   9724,   7778,   8382,   4391,   6766,   9905,   3181,   4170,   7544,   2997,   3025,   9358,   6015,   9919,   5132,   3598,   8779,   4420,   8931,   1492,   5491,   4186,   4720,   2568,   2530,   4618,   4109,   9384,   3000,   9766,   9524,   1027,   6315,   9806,   3442,   7256,   2432,   2429,   7696,   4527,   1802,   6606,   3018,   6295,   2985]}
# Create DataFrame
df = pd.DataFrame(data)

df['cluster']=np.where(df['recency']>df['recency'].mean() & df['frequency']>df['frequency'].mean()+
                       df['frequency'].std() ,2,0)

TypeError: Cannot perform 'rand_' with a dtyped [int64] array and scalar of type [bool]

Upvotes: 0

Views: 39

Answers (1)

jezrael
jezrael

Reputation: 862441

I think there missing () for conditions because priority of operators:

df['cluster'] = np.where((df['recency']>df['recency'].mean()) & 
                         (df['frequency']>df['frequency'].mean()+df['frequency'].std()),2,0)

Upvotes: 1

Related Questions