Reputation: 988
I have two dataframes as below. If a person buys something, we can also recommend similar products.
df1 has a list of items bought by each person. df2 has recommended add-on products. For example "Gopu" buys bun, then I have to recommend "butter" and "jam"
If there is no added_product item (from df2) then it need not appear in the output. For (e.g) "Gopu" buys an item "biscuit" but there is no add on item to recommend from df2. Hence it will not appear in the output table. Thanks
Simple df1.df2 join by left is not working for me.
df1:
name product
Gopu biscuit
Gopu bun
Gopu ink
Aish ball
Aish doll
Aish bun
Aish ink
Colin bun
Colin handsanitize
Colin paper
df2:
product added-product
bun butter
bun jam
ink cloth
ink bib
paper pen
doll barbie
Expected output:
Name added-product
Gopu butter
Gopu jam
Gopu cloth
Gopu bib
Aish barbie
Aish butter
Aish jam
Aish cloth
Aish bib
Colin butter
Colin jam
Colin pen
Thanks.
Upvotes: 0
Views: 33
Reputation: 1710
dfnew=df1.join(df2,(df1.product==df2.product),"cross").select('name','added-product').orderBy('name')
dfnew.show()
+-----+-------------+
| name|added-product|
+-----+-------------+
| Aish| butter|
| Aish| jam|
| Aish| cloth|
| Aish| bib|
| Aish| barbie|
|Colin| jam|
|Colin| pen|
|Colin| butter|
| Gopu| butter|
| Gopu| cloth|
| Gopu| jam|
| Gopu| bib|
+-----+-------------+
Upvotes: 1