Crops
Crops

Reputation: 5154

Detect duplication within delimited values in a cell in excel

I have some tabular data as follows.

|   | A        | B            | C                | D                                                 |
|---|----------|--------------|------------------|---------------------------------------------------|
|   |          | p1           | p2               | pn                                                |
| 1 | Lanterns | Bruce Wayne  | Jean-Paul Valley | Dick Grayson; Terry McGinnis; Jean-Paul Valley    |
| 2 | Bats     | Alan Scott   | Hal Jordan       | Guy Gardner; John Stewart; Kyle Rayner; Simon Baz |
| 3 | Fates    | Kent Nelson  | Khalid Nassour   | Hector Hall; Khalid Nassour; Khalid Ben-Hassin    |
| 4 | Supes    | Clark Kent   | John Henry Irons | Conner Kent; Hank Henshaw; Kong Kenan             |
| 5 | Spideys  | Peter Parker | Peter Parker     | Ben Reilly; Miles Morales                         |
| 6 | Irons    | Tony Stark   | Happy Hogan      | James Rhodes; Eddie March; James Rhodes           |

For each row, I want to find if duplication exists between columns B, C as well as semi colon separated values of column D.

How to do this in excel?

The desired output will be as follows.

| X | A        | B            | C                | D                                                 | E     |
|---|----------|--------------|------------------|---------------------------------------------------|-------|
|   |          | p1           | p2               | pn                                                |       |
| 1 | Lanterns | Bruce Wayne  | Jean-Paul Valley | Dick Grayson; Terry McGinnis; Jean-Paul Valley    | TRUE  |
| 2 | Bats     | Alan Scott   | Hal Jordan       | Guy Gardner; John Stewart; Kyle Rayner; Simon Baz | FALSE |
| 3 | Fates    | Kent Nelson  | Khalid Nassour   | Hector Hall; Khalid Nassour; Khalid Ben-Hassin    | TRUE  |
| 4 | Supes    | Clark Kent   | John Henry Irons | Conner Kent; Hank Henshaw; Kong Kenan             | FALSE |
| 5 | Spideys  | Peter Parker | Peter Parker     | Ben Reilly; Miles Morales                         | TRUE  |
| 6 | Irons    | Tony Stark   | Happy Hogan      | James Rhodes; Eddie March; James Rhodes           | TRUE  |

Edit There was an error in the column names in the question causing lack in clarity. Fixed it now.

Update

Here is my attempt with VBA as suggested by @Foxfire And Burns And Burns. It is adapted from https://superuser.com/a/1005497/460054

Public Function HasDuplicates(list As String, delimiter As String) As String
Dim arrSplit As Variant, i As Long, tmpDict As Object, tmpOutput As Boolean
Set tmpDict = CreateObject("Scripting.Dictionary")
arrSplit = Split(list, delimiter)
tmpOutput = False
For i = LBound(arrSplit) To UBound(arrSplit)
    If tmpDict.Exists(Trim(arrSplit(i))) Then
        tmpOutput = True
        Exit For
    Else
    tmpDict.Add Trim(arrSplit(i)), Trim(arrSplit(i))
    End If
Next i
HasDuplicates = tmpOutput
'housekeeping
Set tmpDict = Nothing
End Function

Here is all possibel use cases again as suggested by @Foxfire And Burns And Burns.

+---+-----+----+-----------+--------------------+-------+
|   |  A  | B  |     C     |         D          |   E   |
+---+-----+----+-----------+--------------------+-------+
| 1 | A   | B  |           | A; B;              | False |
| 2 | A   |    |           | A; ;               | True  |
| 3 |     |    |           | ; ;                | True  |
| 4 | G   | K  | G         | G; K; G            | True  |
| 5 | N   | M  | O         | N; M; O            | False |
| 6 | N   | N  | O         | N; N; O            | True  |
| 7 | V   | U  | X; Y; X   | V; U; X; Y; X      | True  |
| 8 | P J | VK | P; J; V K | P J; VK; P; J; V K | False |
| 9 | VK  | O  | R; VK     | VK; O; R; VK       | True  |
+---+-----+----+-----------+--------------------+-------+

The formula for columnD is =CONCATENATE(B2,"; ",C2, "; ",D2) and for E is =HasDuplicates(E2, ";").

But here it is not handling empty cells. Rows 2 and 3 should also be False.

Upvotes: 2

Views: 133

Answers (5)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60334

If you have O365 or Excel 2016 with the TEXTJOIN function:

=NOT(ISERROR(FILTERXML("<t><s>" &TEXTJOIN("</s><s>",TRUE,TRIM(B2),TRIM(C2),SUBSTITUTE(TRIM(D2),"; ","</s><s>"))& "</s></t>","//s[.=./following-sibling::*]")))

If you don't have TEXTJOIN, but do have FILTERXML, you can use:

=NOT(ISERROR(FILTERXML("<t><s>"&TRIM(B2)&"</s><s>"&TRIM(C2)&"</s><s>"&SUBSTITUTE(TRIM(D2),"; ","</s><s>")&"</s></t>","//s[.=./following-sibling::*]")))

enter image description here

We construct an XML of all the names in separate nodes, and then look for duplicates.

Without the NOT(ISERROR(… part, the formula will return the name of the duplicate (or an array of the names if there are more than one set of duplicates).

NOTE: The formula depends on the separator in Column D being ; (semicolon-space). If the space will not always be there, the formula will need modification to remove it if present (nested substitutes, or TRIM's would do that).

eg

=NOT(ISERROR(FILTERXML("<t><s>"&TRIM(B11)&"</s><s>"&TRIM(C11)&"</s><s>"&SUBSTITUTE(SUBSTITUTE(TRIM(D11),"; ",";"),";","</s><s>")&"</s></t>","//s[.=./following-sibling::*]")))

Results on 2nd Test

enter image description here

If you have an earlier version of Excel, and can use a VBA solution, try:

Option Explicit
Function hasDups(rg As Range, Optional sDelim As String = ";") As Boolean
    Dim myDict As Object
    Dim x, y, s As String, i As Long, c As Range

Set myDict = CreateObject("scripting.dictionary")

For Each c In rg
    x = Split(c.Value2, sDelim)
    For Each y In x
      If Len(Trim(y)) > 0 Then
        If Not myDict.exists(Trim(y)) Then
            myDict.Add Trim(y), y
        Else
            hasDups = True
            Exit Function
        End If
      End If
    Next y
Next c

End Function

Upvotes: 6

JvdV
JvdV

Reputation: 75910

It's possible through formula but there are some considerations:

  • Because of the names can be either Tony Stark and Anthony Stark or Paul Valley and Jean-Paul Valley you can't use SEARCH
  • You'll have to concatenate the three columns and turn that into an array
  • Check the array for duplicates (unfortunately without COUNTIF), with
  • Check the result against the total of items in the array.

    =SUMPRODUCT(--(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(B2&"; "&C2&"; "&D2,"",""),"; ","</s><s>")&"</s></t>","//s")=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(B2&"; "&C2&"; "&D2,"",""),"; ","</s><s>")&"</s></t>","//s"))))<>LEN(B2&"; "&C2&"; "&D2)-LEN(SUBSTITUTE(B2&"; "&C2&"; "&D2,";",""))+1
    

It's a lenghty one, but FILTERXML puts our values in an array rather nicely, which we then can use.

Note: It's an array formula and needs to be confirmed through CtrlShiftEnter

enter image description here

Upvotes: 4

If your values got extra spaces, maybe it can help to add a TRIM function to clean it.

I got this:

enter image description here

Column E got this formula:

=IFERROR(SEARCH(TRIM(C1);D1);0)>0

The output will be a boolean value TRUE/FALSE In screenshot it's in spanish but VERDADERO means TRUE and FALSO means FALSE.

In row 5 and 6 the output is FALSE becaue values Peter Parker and Happy Hogan are not found in column D. You posted in your wished output that these 2 rows should return TRUE, but you have not explained why.

Upvotes: 1

PASUMPON V N
PASUMPON V N

Reputation: 1186

Could you please try this formula

=IFERROR(IF(SEARCH(C1,D1,1)>0,"TRUE"), "FALSE")

Upvotes: 2

riskypenguin
riskypenguin

Reputation: 2199

Try this formula in cell E1 and copy it down:

=IF(ISNUMBER(SEARCH(C1;D1));TRUE;FALSE)

You might need to change the semicolons ; to regular commas , depending on your version of Excel.

EDIT: A simpler and more elegant version would be

=ISNUMBER(SEARCH(C1;D1))

Upvotes: 2

Related Questions