Demetri Pananos
Demetri Pananos

Reputation: 7404

Pivot Table and Counting

I have a data set indicating who has shopped at which stores.

ID  Store
1    C
1    A
2    A
2    B
3    A
3    B
3    C

Can I use a pivot table to determine the frequency of a shopper going to other stores? I'm thinking like a 3X3 matrix where the columns and rows would indicate how many people went to both stores.

Desired output
    A   B  C
A   3   2   2
B   2   3   1
C   2   1    3

Upvotes: 1

Views: 64

Answers (2)

jezrael
jezrael

Reputation: 862671

Another faster solution with groupby, unstack and dot:

df = df.groupby(['ID','Store']).size().unstack(fill_value=0)
df = df.T.dot(df)
print (df)
Store  A  B  C
Store         
A      3  2  2
B      2  2  1
C      2  1  2

Timings:

In [119]: %timeit (jez(df))
1000 loops, best of 3: 1.72 ms per loop

In [120]: %timeit (psi(df))
100 loops, best of 3: 7.07 ms per loop

Code for timings:

N = 1000
df = pd.DataFrame({'ID':np.random.choice(5, N), 
                   'Store': np.random.choice(list('ABCDEFGHIJK'), N)})
print (df)


def jez(df):
    df = df.groupby(['ID','Store']).size().unstack(fill_value=0)
    return df.T.dot(df)

def psi(df):
    mat = pd.crosstab(df.ID, df.Store)    
    return mat.T.dot(mat)

print (jez(df))
print (psi(df))

Upvotes: 2

akuiper
akuiper

Reputation: 214957

You can create a conditional table of ID and Store with pd.crosstab() and then calculate the matrix product of its transpose and itself, which should produce what you need:

mat = pd.crosstab(df.ID, df.Store)    
mat.T.dot(mat)

#Store  A   B   C
#Store          
#   A   3   2   2
#   B   2   2   1
#   C   2   1   2

Note: Since only two IDs visited store B and C, I suppose the corresponding cells should be 2 instead of 3:

Upvotes: 2

Related Questions