Justin Khoo
Justin Khoo

Reputation: 3251

Create a column based on values of multiple rows tied to a key

I have a DataFrame with customers and categories they shop in, and I want to know if a customer is a cross-category shopper (there are only three categories). What is the best way to turn this:

Customer Category
foo      A
foo      B
boo      A
zoo      C

to this:

Customer  A B C CrossCategory
foo       Y Y N  True
boo       Y N N  False
zoo       N N Y  False

Upvotes: 0

Views: 91

Answers (1)

Cleb
Cleb

Reputation: 25997

In case you are only interested in the column CrossCategory, you can use:

df.groupby('Customer')['Category'].nunique() > 1

which gives

Customer
boo    False
foo     True
zoo    False
Name: Category, dtype: bool

For the pivot part:

You can add another column:

df['visited'] = 'Y'

and then use pivot

p = df.pivot(index='Customer', columns='Category', values='visited').fillna('N')

Category  A  B  C
Customer         
boo       Y  N  N
foo       Y  Y  N
zoo       N  N  Y

Then we use a second approach to get the CrossCategory column

# alternative approach to the nunique part from above
df['CrossCategory'] = df.groupby('Customer')['Category'].transform('size') > 1

  Customer Category visited  CrossCategory
0      foo        A       Y           True
1      foo        B       Y           True
2      boo        A       Y          False
3      zoo        C       Y          False

and finally concatenate the two dataframes

pd.concat([p, df[['Customer', 'CrossCategory']].drop_duplicates().set_index('Customer')], sort=False, axis=1)

That yields:

     A  B  C  CrossCategory
boo  Y  N  N          False
foo  Y  Y  N           True
zoo  N  N  Y          False

I am pretty sure that there are more elegant solutions, but that gives the desired output.

If you have big datasets, you might want to run groupby with sort=False.

Upvotes: 2

Related Questions