Caroline
Caroline

Reputation: 23

Concatenate 1000+ cells in a row in Excel to identify duplicates

I need to compare a large set of data. I have a matrix of 1124 rows and 700 columns and I need to somehow identify which rows are identical (excluding the first row which is an ID).

I have tried various ways to identify duplicates but I cannot seem to find a way to identify duplicates across a while range of cells where I want to compare the whole row (excluding column A) with all other rows.

I have also tried to concatenate all of the data (each column only has a 1 or a 0 in it) but I cannot find any way to concatenate a range that I can then drag down for all 900+ rows.

Maybe there are other ways to do this that I haven't thought of. Would be very appreciative of any ideas or suggestions as google has failed me thus far. I will mention that I have found commercial tools that you can plug into excel to do this but we are not permitted to download these in work so I cant even use a trial of one of these.

Thanks in advance for any suggestions.

UPDATE:

My first concatenation proble is solved but the reason for concatenating was to allow me to identify duplicates but I am stuck here. I now have my thousand characters in one cell but the conditional formatting of the duplicate cell values does nothing. I have tested this by copying one cell into 10 rows in a new sheet so I know the values are identical and still no good. I have also tried countifs but that doesn't work either. Any suggestions for how I could perhaps identify the duplicates?

Here is an example of one cells data. I need to compare about 1000 similar values to identify duplicates: 001000000000000111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000001000010000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Upvotes: 2

Views: 2762

Answers (4)

Bharat Anand
Bharat Anand

Reputation: 484

For the second part of your question, assuming that your data is present in cells A1 through A950 -

  1. Select these cells and sort in ascending order

  2. Leave the first cell in the list and select A2 through the last cell and then ..

  3. .. Click Conditional Formatting -> New Rule -> Use a formula to determine which cells to format

  4. Enter the formula =A1=A2, and set a Format of your choice

  5. Click OK, Click OK

This should result in every first value of a set of duplicates to be non-highlighted and all the following duplicates to be highlighted.

Hope this helps!

Upvotes: 0

Caroline
Caroline

Reputation: 23

ok,I think I'm sorted, copying the columns to notepad and removing he tabs and then pasting back into excel sorted out problem A.

For the comparison, I sorted by this column and then added a new column and put text in that if the value in the row was the same as the previous row. This allowed me to identify where there are a large bunch of duplicates.

Upvotes: 0

Bharat Anand
Bharat Anand

Reputation: 484

Select the data grid (your 900+ rows x 1000+ columns).

Copy

Open notepad and paste

The individual cells will be separated by a tab in your notepad. Select one of the tab characters and copy (Ctrl+C)

Find and Replace (Ctrl+H) the tab characters with nothing, hit Enter. All the tab characters are gone now and what you should have is concatenated strings of all the 0's and 1's.

If any of the strings starts with a 0, you would want to preserve that. Append a "'" character in front of each of these strings (if you need help with this I can find some easy way for you).

Now Select All in notepad and copy all the 900+ strings

Go to excel and paste it in a new column next to the last data column.

BINGO!!

Upvotes: 2

teylyn
teylyn

Reputation: 35915

A solution will very much depend on your Excel version. In Excel 2016 with an Office 365 subscription, you can use the new TextJoin function to concatenate cell values.

=TEXTJOIN("",FALSE,B3:ALM3)

Such long strings are not suitable for Countif(), though, so you may want to break down the data into several join columns instead of just one.

Upvotes: 1

Related Questions