Reputation: 45
What is an efficient method of doing this?
I have a column with name of buyer and a column with item names. Each item the person bought is on a new row
For example:
Person 1 Item 1
Person 1 Item 2
Person 1 Item 5
Person 1 Item 7
Person 2 Item 1
Person 2 Item 2
Person 2 Item 11
Person 2 Item 15
Person 2 Item 20
Person 2 Item 21
Person 2 Item 17
Person 3 Item 1
Person 3 Item 2
Person 3 Item 6
Person 3 Item 11
Person 3 Item 15
Person 4 Item 1
Person 4 Item 2
Person 4 Item 5
Person 4 Item 7
There are about 1000000 rows in total and each person has an average of 30 items.
I want to count how often two specific items are bought by a person. I am picturing it something like this
Item1 Item2 Item3 Item4 Item5 Item6
Item1 xxxxx 0% 0% 5% 10% 90%
Item2
Item3
Item4
Item5
Item6
I have tried using pivot table putting item on row labels and person on column labels then counting items. Then I can use a formula lookup and multiply the results from the pivot table but this is doesn't work with such a large file. Is there a more efficient method?
I am open to all kinds of solutions.
Upvotes: 0
Views: 2321
Reputation: 19319
You can use a helper 'table' to do this. First create a table of purchases by person. The formula in this table is:
=SUMPRODUCT(--($A$1:$A$20=E$2),--($B$1:$B$20=$D3))
Which gives a 1/0 result if a person ever bought that item. Example:
Then create the grid of products like in your post and enter this formula:
=SUMPRODUCT($E3:$H3,INDEX($E$3:$H$12,MATCH(K$2,$D$3:$D$12,0),0))
Which multiples instances of purchase of Item X and Item Y. Example:
Upvotes: 1
Reputation: 371
Maybe I misunderstand you, but you are not interested in the person, who buys but in the items, bought by the same person? I do not think you can do this in one step only using formulas (of course in vba you can do it easier).
To do it w/o vba you could:
After this you have something like
P I Items_a All_Items
1 A A
1 B AB
1 E ABE
1 G ABEG ABEG
2 A A
2 B AB
2 K ABK
2 O ABKO
2 Q ABKOQ
2 T ABKOQT
2 U ABKOQTU ABKOQTU
3 A A
3 B AB
3 F ABF
3 K ABFK
3 O ABFKO ABFKO
4 A A
4 B AB
4 E ABE
4 G ABEG ABEG
In the next step you could copy all combinations to a new table, and build all combinations (ascending, cause items were sorted) in a column and mark as 1 if condition matches
To explain it easier the items are named A, B, ... (corresponding to Item 1, Item 2 ... in your example) The formula is something like =IF(IS_ERROR(FIND(PART(B$1;1;1);$A2));0;1)*IF(IS_ERROR(FIND(PART(B$1;2;1);$A2));0;1)
And your case it would be the following possible combinations
AB AE AF AG AK AO AQ AT AU BE BF BG BK BO BQ BT BU EF EG EK EO EQ ET EU FG FK FO FQ FT FU GK GO GQ GT GU KO KQ KT KU OQ OT OU QT QU TU
But in this example 66% exist, So I only Show the beginning of the table:
XXXXX AB AE AF AG AK AO AQ AT AU BE BF
ABEG 1 1 0 1 0 0 0 0 0 1 0
ABEG 1 1 0 1 0 0 0 0 0 1 0
ABFKO 1 0 1 0 1 1 0 0 0 0 1
ABKOQTU 1 0 0 0 1 1 1 1 1 0 0
SUM ALL 4 2 1 2 2 2 1 1 1 2 1
And now you can count, whatever you want.
A simple WVERWEIS function would help to get this:
A B E F G K O Q T U
A 0 4 2 1 2 2 2 1 1 1
B 0 0 2 1 2 2 2 1 1 1
E 0 0 0 0 2 0 0 0 0 0
F 0 0 0 0 0 1 1 0 0 0
G 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 0 0 2 1 1 1
O 0 0 0 0 0 0 0 1 1 1
Q 0 0 0 0 0 0 0 0 1 1
T 0 0 0 0 0 0 0 0 0 1
U 0 0 0 0 0 0 0 0 0 0
But for my opinion you can handle that for maybe 10 Items (the Help- Cols would be n*(n-1)/2, 10 Items --> 45 Columns (cause AA, BB, ... are not evaluated)
In all other cases you should try to program that.
Upvotes: 0