Reputation: 7404
I have a data set indicating who has shopped at which stores.
ID Store
1 C
1 A
2 A
2 B
3 A
3 B
3 C
Can I use a pivot table to determine the frequency of a shopper going to other stores? I'm thinking like a 3X3 matrix where the columns and rows would indicate how many people went to both stores.
Desired output
A B C
A 3 2 2
B 2 3 1
C 2 1 3
Upvotes: 1
Views: 64
Reputation: 862671
Another faster solution with groupby
, unstack
and dot
:
df = df.groupby(['ID','Store']).size().unstack(fill_value=0)
df = df.T.dot(df)
print (df)
Store A B C
Store
A 3 2 2
B 2 2 1
C 2 1 2
Timings:
In [119]: %timeit (jez(df))
1000 loops, best of 3: 1.72 ms per loop
In [120]: %timeit (psi(df))
100 loops, best of 3: 7.07 ms per loop
Code for timings:
N = 1000
df = pd.DataFrame({'ID':np.random.choice(5, N),
'Store': np.random.choice(list('ABCDEFGHIJK'), N)})
print (df)
def jez(df):
df = df.groupby(['ID','Store']).size().unstack(fill_value=0)
return df.T.dot(df)
def psi(df):
mat = pd.crosstab(df.ID, df.Store)
return mat.T.dot(mat)
print (jez(df))
print (psi(df))
Upvotes: 2
Reputation: 214957
You can create a conditional table of ID
and Store
with pd.crosstab()
and then calculate the matrix product of its transpose and itself, which should produce what you need:
mat = pd.crosstab(df.ID, df.Store)
mat.T.dot(mat)
#Store A B C
#Store
# A 3 2 2
# B 2 2 1
# C 2 1 2
Note: Since only two IDs visited store B
and C
, I suppose the corresponding cells should be 2 instead of 3:
Upvotes: 2