Alis
Alis

Reputation: 80

Co-occurence matrix from two data frames. Python

I have two data frames, Food and Drink.

food = {'fruit':['Apple', np.nan, 'Apple'],
        'food':['Cake', 'Bread', np.nan]} 
# Create DataFrame
food = pd.DataFrame(food)

    fruit   food
0   Apple   Cake
1   NaN     Bread
2   Apple   NaN
drink = {'smoothie':['S_Strawberry', 'S_Watermelon', np.nan],
        'tea':['T_white', np.nan, 'T_green']}
# Create DataFrame
drink = pd.DataFrame(drink)

    smoothie        tea
0   S_Strawberry    T_white
1   S_Watermelon    NaN
2   NaN             T_green

The rows represent specific customers. I would like to make a co-occurrence matrix of food and drinks.

expected outcome: (columns and ids do not have to be in this order)

               Apple    Bread   Cake
            
S_Strawberry    1.0      NaN    1.0
S_Watermelon    NaN      1.0    NaN
T_white         1.0      NaN    1.0
T_green         1.0      NaN    NaN

so far I can make a co-occurrence matrix for each of the df but I don't know how I would bind the two data frames.

thank you.

Upvotes: 3

Views: 49

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

I think you want pd.get_dummies and matrix multiplication:

pd.get_dummies(drink). T @ pd.get_dummies(food)

Output:

                       fruit_Apple  food_Bread  food_Cake
smoothie_S_Strawberry            1           0          1
smoothie_S_Watermelon            0           1          0
tea_T_green                      1           0          0
tea_T_white                      1           0          1

You can get rid of the prefixes with:

pd.get_dummies(drink, prefix='', prefix_sep=''). T @ pd.get_dummies(food, prefix='', prefix_sep='')

Output:

              Apple  Bread  Cake
S_Strawberry      1      0     1
S_Watermelon      0      1     0
T_green           1      0     0
T_white           1      0     1

Upvotes: 2

Related Questions