Reputation: 80
I have two data frames, Food and Drink.
food = {'fruit':['Apple', np.nan, 'Apple'],
'food':['Cake', 'Bread', np.nan]}
# Create DataFrame
food = pd.DataFrame(food)
fruit food
0 Apple Cake
1 NaN Bread
2 Apple NaN
drink = {'smoothie':['S_Strawberry', 'S_Watermelon', np.nan],
'tea':['T_white', np.nan, 'T_green']}
# Create DataFrame
drink = pd.DataFrame(drink)
smoothie tea
0 S_Strawberry T_white
1 S_Watermelon NaN
2 NaN T_green
The rows represent specific customers. I would like to make a co-occurrence matrix of food and drinks.
expected outcome: (columns and ids do not have to be in this order)
Apple Bread Cake
S_Strawberry 1.0 NaN 1.0
S_Watermelon NaN 1.0 NaN
T_white 1.0 NaN 1.0
T_green 1.0 NaN NaN
so far I can make a co-occurrence matrix for each of the df but I don't know how I would bind the two data frames.
thank you.
Upvotes: 3
Views: 49
Reputation: 150735
I think you want pd.get_dummies
and matrix multiplication:
pd.get_dummies(drink). T @ pd.get_dummies(food)
Output:
fruit_Apple food_Bread food_Cake
smoothie_S_Strawberry 1 0 1
smoothie_S_Watermelon 0 1 0
tea_T_green 1 0 0
tea_T_white 1 0 1
You can get rid of the prefixes with:
pd.get_dummies(drink, prefix='', prefix_sep=''). T @ pd.get_dummies(food, prefix='', prefix_sep='')
Output:
Apple Bread Cake
S_Strawberry 1 0 1
S_Watermelon 0 1 0
T_green 1 0 0
T_white 1 0 1
Upvotes: 2