Reputation: 1279
I am attempting to take string data from one dataframe and substitute it with numerical values and create a cross product of the results like the following below.
I read the data into dataframes, example input coming below:
import pandas as pd
shoppers = pd.DataFrame({'name': ['bill', 'bob', 'james', 'jill', 'henry'],
'item': ['apple','apple','orange','grapes','orange']})
stores = pd.DataFrame({'apple' : [0.25, 0.20, 0.18],
'orange': [0.30, 0.4, 0.35],
'grapes': [1.0, 0.9, 1.1],
'store': ['kroger', 'publix', 'walmart']})
Here's the resulting shoppers dataframe:
item
name
bill apple
bob apple
james orange
jill grapes
henry orange
And here's the resulting stores dataframe:
apple orange grapes
store
kroger 0.25 0.30 1.0
publix 0.20 0.40 0.9
walmart 0.18 0.35 1.1
And the desired result is the price each person would pay at each store. For example:
I'm really struggling to find the right way to make such a transformation in Pandas efficiently. I could easily loop over shoppers and stores and build each row in a brute-force manner, but there must be a more efficient way to do this with the pandas API. Thanks for any suggestions.
Upvotes: 1
Views: 467
Reputation: 150745
Here's a solution, not cross, but dot:
pd.crosstab(shoppers.index, shoppers['item']).dot(stores.T)
Output:
kroger publix walmart
row_0
bill 0.25 0.2 0.18
bob 0.25 0.2 0.18
henry 0.30 0.4 0.35
james 0.30 0.4 0.35
jill 1.00 0.9 1.10
Upvotes: 2