divingTobi
divingTobi

Reputation: 2300

Items appearing together in a basket

I am completely stuck - I would like to count the number of cases an item appears together with another item in a basket, i.e. some correlation:

   basket item
0       a    x
1       a    y
2       a    z
3       b    x
4       b    y
5       c    k
6       c    y
7       c    z
8       d    x
9       d    y
10      d    z
11      d    l

That is, items x, y, and z appear together in basket a, items x and y appear together in basket b, etc.

I am looking for a way to get here:

    x   y   z   k   l
x       3   2       1
y   3       3   1   1
z   2   3       1   1
k       1   1       
l   1   1   1       

The matrix is obviously symmetric. The first 3 in the top row means, that x and y appear 3 times together in one basket (namely a, b and d), etc.

Thanks a bunch!

Upvotes: 1

Views: 97

Answers (1)

jezrael
jezrael

Reputation: 862481

Use DataFrame.merge by same DataFrame with df, then remove rows with same item values and last count combinations by crosstab, last if need order of values like original add DataFrame.reindex:

df1 = df.merge(df, on='basket')
i = df['item'].unique()
df1 = df1[df1.item_x != df1.item_y]
df2 = pd.crosstab(df1.item_x, df1.item_y).reindex(index=i, columns=i)

print (df2)
item_y  x  y  z  k  l
item_x               
x       0  3  2  0  1
y       3  0  3  1  1
z       2  3  0  1  1
k       0  1  1  0  0
l       1  1  1  0  0

If need missing values use DataFrame.pivot_table:

df1 = df.merge(df, on='basket')
i = df['item'].unique()
df1 = df1[df1.item_x != df1.item_y]
df2 = df1.pivot_table(index='item_x', 
                      columns='item_y', 
                      aggfunc='size').reindex(index=i, columns=i)

print (df2)
item_y    x    y    z    k    l
item_x                         
x       NaN  3.0  2.0  NaN  1.0
y       3.0  NaN  3.0  1.0  1.0
z       2.0  3.0  NaN  1.0  1.0
k       NaN  1.0  1.0  NaN  NaN
l       1.0  1.0  1.0  NaN  NaN

Upvotes: 1

Related Questions