Brock Sibert
Brock Sibert

Reputation: 5

Finding Duplicates across a thousand lists

I have over 1,100 lists that each contain no more than 30 items in them. I am trying to see if there are any items within the lists that appear in all lists. I was initially thinking that I would need to compare the list in column A to the list in column B, store the duplicates, then compare the duplicates to the list in Column C, store the new duplicates, compare the new duplicates to the list in Column D, and so on until all the lists have been covered.

My questions are: 1.) Is this the correct way to approach this? 2.) If so, is there a simple VBA code that could be used to do this?

Upvotes: 0

Views: 139

Answers (3)

Brajesh
Brajesh

Reputation: 11

Please try to use thisDuplicate value in excel If it will not work I can help you with Macro VB code.
Logic will be as below:
1. Keep 1st column as base to check all the other column
2. Check each 30 cell of the 1st column in a loop with all the other column cell.
3. Stop the loop, if you don't the value in an entire column.

Upvotes: 0

Karl Kristjansson
Karl Kristjansson

Reputation: 328

Here's my non VBA solution to this fun problem. The plan is to search each item in any one list and compare to all the other lists in the table.

Start off by inserting a new "A" column to the left of your table. Copy any list and paste to A35.

  • if your goal is only to find items occuring in all lists, choose the smallest list.
  • if you would like to analyse, choose the largest list or even multiple lists.
  • you could include all items by copy/paste TRANSPOSE the entire table to new sheet. then you have less than 30 colums. copy paste each into one column and delete duplicates of this list with data--> remove duplicates.

Now you need to create a formula in cell B35 that searches for the string in A35 in the range B1:B30. You drag the formula all the way right and down.

=COUNTIF(B$1:B$30,$A30)

The results will be the count of each item found in each list. In order to see if any item is in all lists, then all columns within the specific row should count at least 1 item. To the right of the results, see what the minimum value in the row is with:

=MIN(B35:API35)

(assuming your table ends in column API)

If any of your rows have a minimum of 1, then the item is included in all lists.

You could then also sum up the line to see which items occur the most and you could use the "max" instead of "min" to see if any list has duplicates.

Upvotes: 0

MattClarke
MattClarke

Reputation: 1647

  1. Deduplicate each list using Data > Remove Duplicates
  2. Collate all the lists into one long list
  3. Create a pivot table with the column of items as the Row dimension
  4. Use the same column as the Value displayed in the pivot table, and aggregate using Count.
  5. Sort the pivot table in descending order of that count.

The count shows the number of lists in which each item appears. If any have a count of 1100 then they must occur in every list.

Upvotes: 2

Related Questions