Reputation: 5154
I have some tabular data as follows.
| | A | B | C | D |
|---|----------|--------------|------------------|---------------------------------------------------|
| | | p1 | p2 | pn |
| 1 | Lanterns | Bruce Wayne | Jean-Paul Valley | Dick Grayson; Terry McGinnis; Jean-Paul Valley |
| 2 | Bats | Alan Scott | Hal Jordan | Guy Gardner; John Stewart; Kyle Rayner; Simon Baz |
| 3 | Fates | Kent Nelson | Khalid Nassour | Hector Hall; Khalid Nassour; Khalid Ben-Hassin |
| 4 | Supes | Clark Kent | John Henry Irons | Conner Kent; Hank Henshaw; Kong Kenan |
| 5 | Spideys | Peter Parker | Peter Parker | Ben Reilly; Miles Morales |
| 6 | Irons | Tony Stark | Happy Hogan | James Rhodes; Eddie March; James Rhodes |
For each row, I want to find if duplication exists between columns B, C as well as semi colon separated values of column D.
How to do this in excel?
The desired output will be as follows.
| X | A | B | C | D | E |
|---|----------|--------------|------------------|---------------------------------------------------|-------|
| | | p1 | p2 | pn | |
| 1 | Lanterns | Bruce Wayne | Jean-Paul Valley | Dick Grayson; Terry McGinnis; Jean-Paul Valley | TRUE |
| 2 | Bats | Alan Scott | Hal Jordan | Guy Gardner; John Stewart; Kyle Rayner; Simon Baz | FALSE |
| 3 | Fates | Kent Nelson | Khalid Nassour | Hector Hall; Khalid Nassour; Khalid Ben-Hassin | TRUE |
| 4 | Supes | Clark Kent | John Henry Irons | Conner Kent; Hank Henshaw; Kong Kenan | FALSE |
| 5 | Spideys | Peter Parker | Peter Parker | Ben Reilly; Miles Morales | TRUE |
| 6 | Irons | Tony Stark | Happy Hogan | James Rhodes; Eddie March; James Rhodes | TRUE |
Edit There was an error in the column names in the question causing lack in clarity. Fixed it now.
Update
Here is my attempt with VBA as suggested by @Foxfire And Burns And Burns
.
It is adapted from https://superuser.com/a/1005497/460054
Public Function HasDuplicates(list As String, delimiter As String) As String
Dim arrSplit As Variant, i As Long, tmpDict As Object, tmpOutput As Boolean
Set tmpDict = CreateObject("Scripting.Dictionary")
arrSplit = Split(list, delimiter)
tmpOutput = False
For i = LBound(arrSplit) To UBound(arrSplit)
If tmpDict.Exists(Trim(arrSplit(i))) Then
tmpOutput = True
Exit For
Else
tmpDict.Add Trim(arrSplit(i)), Trim(arrSplit(i))
End If
Next i
HasDuplicates = tmpOutput
'housekeeping
Set tmpDict = Nothing
End Function
Here is all possibel use cases again as suggested by @Foxfire And Burns And Burns
.
+---+-----+----+-----------+--------------------+-------+
| | A | B | C | D | E |
+---+-----+----+-----------+--------------------+-------+
| 1 | A | B | | A; B; | False |
| 2 | A | | | A; ; | True |
| 3 | | | | ; ; | True |
| 4 | G | K | G | G; K; G | True |
| 5 | N | M | O | N; M; O | False |
| 6 | N | N | O | N; N; O | True |
| 7 | V | U | X; Y; X | V; U; X; Y; X | True |
| 8 | P J | VK | P; J; V K | P J; VK; P; J; V K | False |
| 9 | VK | O | R; VK | VK; O; R; VK | True |
+---+-----+----+-----------+--------------------+-------+
The formula for columnD is =CONCATENATE(B2,"; ",C2, "; ",D2)
and for E is =HasDuplicates(E2, ";")
.
But here it is not handling empty cells. Rows 2 and 3 should also be False
.
Upvotes: 2
Views: 133
Reputation: 60334
If you have O365 or Excel 2016 with the TEXTJOIN
function:
=NOT(ISERROR(FILTERXML("<t><s>" &TEXTJOIN("</s><s>",TRUE,TRIM(B2),TRIM(C2),SUBSTITUTE(TRIM(D2),"; ","</s><s>"))& "</s></t>","//s[.=./following-sibling::*]")))
If you don't have TEXTJOIN
, but do have FILTERXML
, you can use:
=NOT(ISERROR(FILTERXML("<t><s>"&TRIM(B2)&"</s><s>"&TRIM(C2)&"</s><s>"&SUBSTITUTE(TRIM(D2),"; ","</s><s>")&"</s></t>","//s[.=./following-sibling::*]")))
We construct an XML of all the names in separate nodes, and then look for duplicates.
Without the NOT(ISERROR(…
part, the formula will return the name of the duplicate (or an array of the names if there are more than one set of duplicates).
NOTE: The formula depends on the separator in Column D being ;
(semicolon-space
). If the space will not always be there, the formula will need modification to remove it if present (nested substitutes, or TRIM's would do that).
eg
=NOT(ISERROR(FILTERXML("<t><s>"&TRIM(B11)&"</s><s>"&TRIM(C11)&"</s><s>"&SUBSTITUTE(SUBSTITUTE(TRIM(D11),"; ",";"),";","</s><s>")&"</s></t>","//s[.=./following-sibling::*]")))
Results on 2nd Test
If you have an earlier version of Excel, and can use a VBA solution, try:
Option Explicit
Function hasDups(rg As Range, Optional sDelim As String = ";") As Boolean
Dim myDict As Object
Dim x, y, s As String, i As Long, c As Range
Set myDict = CreateObject("scripting.dictionary")
For Each c In rg
x = Split(c.Value2, sDelim)
For Each y In x
If Len(Trim(y)) > 0 Then
If Not myDict.exists(Trim(y)) Then
myDict.Add Trim(y), y
Else
hasDups = True
Exit Function
End If
End If
Next y
Next c
End Function
Upvotes: 6
Reputation: 75910
It's possible through formula but there are some considerations:
Tony Stark
and Anthony Stark
or Paul Valley
and Jean-Paul Valley
you can't use SEARCH
COUNTIF
), with Check the result against the total of items in the array.
=SUMPRODUCT(--(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(B2&"; "&C2&"; "&D2,"",""),"; ","</s><s>")&"</s></t>","//s")=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(B2&"; "&C2&"; "&D2,"",""),"; ","</s><s>")&"</s></t>","//s"))))<>LEN(B2&"; "&C2&"; "&D2)-LEN(SUBSTITUTE(B2&"; "&C2&"; "&D2,";",""))+1
It's a lenghty one, but FILTERXML
puts our values in an array rather nicely, which we then can use.
Note: It's an array formula and needs to be confirmed through CtrlShiftEnter
Upvotes: 4
Reputation: 11988
If your values got extra spaces, maybe it can help to add a TRIM function to clean it.
I got this:
Column E got this formula:
=IFERROR(SEARCH(TRIM(C1);D1);0)>0
The output will be a boolean value TRUE/FALSE
In screenshot it's in spanish but VERDADERO means TRUE
and FALSO means FALSE
.
In row 5 and 6 the output is FALSE
becaue values Peter Parker
and Happy Hogan
are not found in column D. You posted in your wished output that these 2 rows should return TRUE
, but you have not explained why.
Upvotes: 1
Reputation: 1186
Could you please try this formula
=IFERROR(IF(SEARCH(C1,D1,1)>0,"TRUE"), "FALSE")
Upvotes: 2
Reputation: 2199
Try this formula in cell E1 and copy it down:
=IF(ISNUMBER(SEARCH(C1;D1));TRUE;FALSE)
You might need to change the semicolons ;
to regular commas ,
depending on your version of Excel.
EDIT: A simpler and more elegant version would be
=ISNUMBER(SEARCH(C1;D1))
Upvotes: 2