jefflarkin
jefflarkin

Reputation: 1279

Taking a cross product of Pandas Dataframes

I am attempting to take string data from one dataframe and substitute it with numerical values and create a cross product of the results like the following below.

I read the data into dataframes, example input coming below:

import pandas as pd
shoppers = pd.DataFrame({'name': ['bill', 'bob', 'james', 'jill', 'henry'], 
                         'item': ['apple','apple','orange','grapes','orange']})

stores = pd.DataFrame({'apple' : [0.25, 0.20, 0.18], 
                       'orange': [0.30, 0.4, 0.35],
                       'grapes': [1.0, 0.9, 1.1], 
                       'store':  ['kroger', 'publix', 'walmart']})

Here's the resulting shoppers dataframe:

         item
name         
bill    apple
bob     apple
james  orange
jill   grapes
henry  orange

And here's the resulting stores dataframe:

         apple  orange  grapes
store                         
kroger    0.25    0.30     1.0
publix    0.20    0.40     0.9
walmart   0.18    0.35     1.1

And the desired result is the price each person would pay at each store. For example:

enter image description here

I'm really struggling to find the right way to make such a transformation in Pandas efficiently. I could easily loop over shoppers and stores and build each row in a brute-force manner, but there must be a more efficient way to do this with the pandas API. Thanks for any suggestions.

Upvotes: 1

Views: 467

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

Here's a solution, not cross, but dot:

pd.crosstab(shoppers.index, shoppers['item']).dot(stores.T)

Output:

       kroger  publix  walmart
row_0                         
bill     0.25     0.2     0.18
bob      0.25     0.2     0.18
henry    0.30     0.4     0.35
james    0.30     0.4     0.35
jill     1.00     0.9     1.10

Upvotes: 2

Related Questions