Reputation: 3251
I have a DataFrame with customers and categories they shop in, and I want to know if a customer is a cross-category shopper (there are only three categories). What is the best way to turn this:
Customer Category
foo A
foo B
boo A
zoo C
to this:
Customer A B C CrossCategory
foo Y Y N True
boo Y N N False
zoo N N Y False
Upvotes: 0
Views: 91
Reputation: 25997
In case you are only interested in the column CrossCategory
, you can use:
df.groupby('Customer')['Category'].nunique() > 1
which gives
Customer
boo False
foo True
zoo False
Name: Category, dtype: bool
For the pivot
part:
You can add another column:
df['visited'] = 'Y'
and then use pivot
p = df.pivot(index='Customer', columns='Category', values='visited').fillna('N')
Category A B C
Customer
boo Y N N
foo Y Y N
zoo N N Y
Then we use a second approach to get the CrossCategory
column
# alternative approach to the nunique part from above
df['CrossCategory'] = df.groupby('Customer')['Category'].transform('size') > 1
Customer Category visited CrossCategory
0 foo A Y True
1 foo B Y True
2 boo A Y False
3 zoo C Y False
and finally concatenate the two dataframes
pd.concat([p, df[['Customer', 'CrossCategory']].drop_duplicates().set_index('Customer')], sort=False, axis=1)
That yields:
A B C CrossCategory
boo Y N N False
foo Y Y N True
zoo N N Y False
I am pretty sure that there are more elegant solutions, but that gives the desired output.
If you have big datasets, you might want to run groupby
with sort=False
.
Upvotes: 2