Felix Torssell
Felix Torssell

Reputation: 45

Count which combination of items are bought most frequently

What is an efficient method of doing this?

I have a column with name of buyer and a column with item names. Each item the person bought is on a new row

For example:

Person 1    Item 1
Person 1    Item 2
Person 1    Item 5
Person 1    Item 7
Person 2    Item 1
Person 2    Item 2
Person 2    Item 11
Person 2    Item 15
Person 2    Item 20
Person 2    Item 21
Person 2    Item 17
Person 3    Item 1
Person 3    Item 2
Person 3    Item 6
Person 3    Item 11
Person 3    Item 15
Person 4    Item 1
Person 4    Item 2
Person 4    Item 5
Person 4    Item 7

There are about 1000000 rows in total and each person has an average of 30 items.

I want to count how often two specific items are bought by a person. I am picturing it something like this

      Item1 Item2 Item3 Item4 Item5 Item6
Item1 xxxxx    0%   0%    5%    10%  90%
Item2   
Item3
Item4   
Item5   
Item6

I have tried using pivot table putting item on row labels and person on column labels then counting items. Then I can use a formula lookup and multiply the results from the pivot table but this is doesn't work with such a large file. Is there a more efficient method?

I am open to all kinds of solutions.

Upvotes: 0

Views: 2321

Answers (2)

Robin Mackenzie
Robin Mackenzie

Reputation: 19319

You can use a helper 'table' to do this. First create a table of purchases by person. The formula in this table is:

=SUMPRODUCT(--($A$1:$A$20=E$2),--($B$1:$B$20=$D3))

Which gives a 1/0 result if a person ever bought that item. Example:

enter image description here

Then create the grid of products like in your post and enter this formula:

=SUMPRODUCT($E3:$H3,INDEX($E$3:$H$12,MATCH(K$2,$D$3:$D$12,0),0))

Which multiples instances of purchase of Item X and Item Y. Example:enter image description here

Upvotes: 1

am2
am2

Reputation: 371

Maybe I misunderstand you, but you are not interested in the person, who buys but in the items, bought by the same person? I do not think you can do this in one step only using formulas (of course in vba you can do it easier).

To do it w/o vba you could:

  1. List item
  2. Sort by Person and Item
  3. Build an accumulating string with all (different) items bought by one person (untested: something like IF(A1=A2;B1;"")&B2
  4. Ignore all strings but the last of a person (something like IF(A2=A3;"";B2)

After this you have something like

P   I   Items_a All_Items
1   A   A   
1   B   AB  
1   E   ABE 
1   G   ABEG    ABEG
2   A   A   
2   B   AB  
2   K   ABK 
2   O   ABKO    
2   Q   ABKOQ   
2   T   ABKOQT  
2   U   ABKOQTU ABKOQTU
3   A   A   
3   B   AB  
3   F   ABF 
3   K   ABFK    
3   O   ABFKO   ABFKO
4   A   A   
4   B   AB  
4   E   ABE 
4   G   ABEG    ABEG

In the next step you could copy all combinations to a new table, and build all combinations (ascending, cause items were sorted) in a column and mark as 1 if condition matches

To explain it easier the items are named A, B, ... (corresponding to Item 1, Item 2 ... in your example) The formula is something like =IF(IS_ERROR(FIND(PART(B$1;1;1);$A2));0;1)*IF(IS_ERROR(FIND(PART(B$1;2;1);$A2));0;1)

And your case it would be the following possible combinations

AB AE AF AG AK AO AQ AT AU BE BF BG BK BO BQ BT BU EF EG EK EO EQ ET EU FG FK FO FQ FT FU GK GO GQ GT GU KO KQ KT KU OQ OT OU QT QU TU

But in this example 66% exist, So I only Show the beginning of the table:

XXXXX   AB  AE  AF  AG  AK  AO  AQ  AT  AU  BE  BF
ABEG    1   1   0   1   0   0   0   0   0   1   0
ABEG    1   1   0   1   0   0   0   0   0   1   0
ABFKO   1   0   1   0   1   1   0   0   0   0   1
ABKOQTU 1   0   0   0   1   1   1   1   1   0   0
SUM ALL 4   2   1   2   2   2   1   1   1   2   1

And now you can count, whatever you want.

A simple WVERWEIS function would help to get this:

    A   B   E   F   G   K   O   Q   T   U
A   0   4   2   1   2   2   2   1   1   1
B   0   0   2   1   2   2   2   1   1   1
E   0   0   0   0   2   0   0   0   0   0
F   0   0   0   0   0   1   1   0   0   0
G   0   0   0   0   0   0   0   0   0   0
K   0   0   0   0   0   0   2   1   1   1
O   0   0   0   0   0   0   0   1   1   1
Q   0   0   0   0   0   0   0   0   1   1
T   0   0   0   0   0   0   0   0   0   1
U   0   0   0   0   0   0   0   0   0   0

But for my opinion you can handle that for maybe 10 Items (the Help- Cols would be n*(n-1)/2, 10 Items --> 45 Columns (cause AA, BB, ... are not evaluated)

In all other cases you should try to program that.

Upvotes: 0

Related Questions