Reputation: 4498
Working with python pandas dataframe df:
product_id |transaction_id | category | color
234 54 A black
349 54 B silver
213 46 A silver
490 46 A black
245 87 A black
249 87 B black
294 87 A silver
I want to flag transaction_IDs that have category of A and B with the same color. So in the scenario above transaction 87 has a product A black and a product B black.
Desired output:
product_id |transaction_id | category | color | flag
234 54 A black
349 54 B silver
213 46 A silver
490 46 A black
245 87 A black X
249 87 B black X
294 87 A silver X
I was trying to create a unique key between category and color and then groupby, but it got messy and I still have to go manually through it. There must be a simpler way.
df['key']=df['category']&df['color']
df['transaction_analysis']= df.groupby('transaction_id').key.transform(lambda x : '&'.join(set(x)))
Upvotes: 1
Views: 449
Reputation: 879561
Here's another way to do it:
import pandas as pd
df = pd.DataFrame({'category': ['A', 'B', 'A', 'A', 'A', 'B', 'A'],
'color': ['black', 'silver', 'silver', 'black',
'black', 'black', 'silver'],
'product_id': [234, 349, 213, 490, 245, 249, 294],
'transaction_id': [54, 54, 46, 46, 87, 87, 87]})
pivoted = df.pivot_table(index=['transaction_id','color'], columns=['category'],
values='product_id')
transaction_color_mask = pd.notnull(pivoted).all(axis=1)
transaction_map = transaction_color_mask.groupby(level=0).any().map({True:'X',False:''})
df['flag'] = df['transaction_id'].map(transaction_map)
print(df)
yields
category color product_id transaction_id flag
0 A black 234 54
1 B silver 349 54
2 A silver 213 46
3 A black 490 46
4 A black 245 87 X
5 B black 249 87 X
6 A silver 294 87 X
The main idea is to use pivot_table
to expose the values-to-be-compared in rows:
In [182]: pivoted
Out[182]:
category A B
transaction_id color
46 black 490.0 NaN
silver 213.0 NaN
54 black 234.0 NaN
silver NaN 349.0
87 black 245.0 249.0
silver 294.0 NaN
Now we can find rows whose categories A
and B
have the same color:
In [183]: transaction_color_mask = pd.notnull(pivoted).all(axis=1); transaction_color_mask
Out[183]:
transaction_id color
46 black False
silver False
54 black False
silver False
87 black True
silver False
dtype: bool
Group by transaction_id
to find transaction_id
s associated with True rows:
In [184]: transaction_color_mask.groupby(level=0).any()
Out[184]:
transaction_id
46 False
54 False
87 True
dtype: bool
then map True
to 'X'
and False
to empty strings:
In [185]: transaction_color_mask.groupby(level=0).any().map({True:'X',False:''})
Out[185]:
transaction_id
46
54
87 X
dtype: object
Finally, recombine the result with df
: df['flag'] = df['transaction_id'].map(transaction_map)
yields the desired result.
Upvotes: 1
Reputation: 402493
Not sure if simpler, but certainly cleaner. You may groupby
on transaction_id
and category
, find unique colours with unique
, and then unstack.
After this, generate a mapping of flag values and assign to df
later.
v = (
df.groupby(['transaction_id', 'category'])
.color
.unique()
.unstack(fill_value=set())
)
m = {
k : 'X' if set(x).intersection(y) else '' for k, x, y in zip(v.index, v.A, v.B)
}
df['flag'] = df['transaction_id'].map(m)
df
product_id transaction_id category color flag
0 234 54 A black
1 349 54 B silver
2 213 46 A silver
3 490 46 A black
4 245 87 A black X
5 249 87 B black X
6 294 87 A silver X
Upvotes: 1