Reputation: 37
I have two data frames. One includes the sales for Product A. The other has the sales for Product B.
I'd like to figure out which Customers only bought Product A, which customers only bought Product B, and which customers bought both.
DF 1:
Cust_ID Industry Product_Type
ABC Manuf Product A
ACB Tech Product A
BAD Manuf Product A
DAC Tech Product A
DEF Retail Product A
DF 2:
Cust_ID Industry Product_Type
DAC Prof Serv Product B
BAD Manuf Product B
BCA Tech Product B
FED Trade Product B
DEF Retail Product B
I'd like the output to be:
BothDF:
Cust_ID Industry
BAD Manuf
DEF Retail
Only_Prod_A_DF:
Cust_ID Industry
ABC Manuf
ACB Tech
DAC Tech
Only_Prod_B_DF:
Cust_ID Industry
DAC Prof Serv
BCA Tech
FED Trade
I've figured out the intersection between the two data frames with the following code:
Sales = pd.merge(Sales_Prod_A, Sales_Prod_B, how='inner', on=['Cust_ID'])
Any thoughts on the customers who bought one or the other? Thank you!
Upvotes: 1
Views: 44
Reputation: 863801
Add indicator=True
and suffixes
parameters to merge
in first step:
Sales = pd.merge(Sales_Prod_A,
Sales_Prod_B,
how='outer',
on=['Cust_ID'],
indicator=True,
suffixes=('','_'))
print (Sales)
Cust_ID Industry Product_Type Industry_ Product_Type_ _merge
0 ABC Manuf Product A NaN NaN left_only
1 ACB Tech Product A NaN NaN left_only
2 BAD Manuf Product A Manuf Product B both
3 DAC Tech Product A Prof Serv Product B both
4 DEF Retail Product A Retail Product B both
5 BCA NaN NaN Tech Product B right_only
6 FED NaN NaN Trade Product B right_only
And then filter by DataFrame.loc
with boolean indexing
and Series.eq
, last rename
columns names for remove _x
and _y
:
BothDF = Sales.loc[Sales['_merge'].eq('both'), ['Cust_ID','Industry']]
print (BothDF)
Cust_ID Industry
2 BAD Manuf
3 DAC Tech
4 DEF Retail
Only_Prod_A_DF = Sales.loc[Sales['_merge'].eq('left_only'), ['Cust_ID','Industry']]
print (Only_Prod_A_DF)
Cust_ID Industry
0 ABC Manuf
1 ACB Tech
Only_Prod_B_DF = (Sales.loc[Sales['_merge'].eq('right_only'), ['Cust_ID','Industry_']]
.rename(columns=lambda x: x.strip('_')))
print (Only_Prod_B_DF)
Cust_ID Industry
5 BCA Tech
6 FED Trade
Upvotes: 4