Reputation: 355
I've been given a long list of combinations of different equipment that customers might have. It looks like what the person who made the list did was just concatenate all the columns that contain equipment product numbers. The problem is that their is no real order in how the columns are filled out. So a customer who got equipment B first and then equipment A would now have a value of B/A while another customer could also have a value of A/B. I'm trying to use either excel or SAS to figure out how many distinct combinations there are REGARDLESS of the order so in this case A/B=b/A and their would only be 1, not 2 combinations. Is there a way to figure this out using EXCEL or SAS because right now I'm having to eyeball it. As always thank you so so much for your help!!
Sandy
Upvotes: 1
Views: 3472
Reputation: 68
This is quick way of providing all unique combinations of equipment in Excel without using VBA.
Next to your combinations text insert columns for each of the equipment pieces - one column per piece of equipment - see the draft below. Each column returns TRUE if the equipment combination contains this column's equipment and FALSE otherwise.
EQUIPMENT CONTAINS_A CONTAINS_B CONTAINS_C CONTAINS_ D Sort_Column
COMBINATION
-------------------------------------------------------------------------------------------
A/B/C TRUE TRUE TRUE FALSE TRUETRUETRUEFALSE
B/A/D/C TRUE TRUE TRUE TRUE TRUETRUETRUETRUE
A/B TRUE TRUE FALSE FALSE TRUETRUEFALSEFALSE
C FALSE FALSE FALSE TRUE FALSEFALSETRUEFALSE
The last column is combining all the TRUE/FALSE values in a text. This text can be filtered and/or sorted to easily identify the unique combinations. If you apply Auto Filter the entries in the AutoFilter will show unique combinations.
These are the formulas I have used behind the values (showing formulas in row2):
CONTAINS A: =NOT(ISERROR(SEARCH("A",F2,1)))
CONTAINS B: =NOT(ISERROR(SEARCH("B",F2,1)))
CONTAINS C: =NOT(ISERROR(SEARCH("C",F2,1)))
CONTAINS D: =NOT(ISERROR(SEARCH("D",F2,1)))
Sort Column: =TEXT(H2,"#") & TEXT(I2,"#") & TEXT(J2,"#") & TEXT(K2,"#")
Sorry about the format of this but I am not allowed to post images :-( Hope it helps!
Upvotes: 1
Reputation: 138
I've had time to put this together for you as an Excel formula. See my original comments at the foot of this posting for links to my sources.
Steps to resolve this:
User Defined Function to sort a cell
Create a new module in Excel VBA (let me know if you need instructions on this, it's only a couple of clicks)
Copy and paste all the code below.
Option Explicit
Const c_Separator = "/"
' User Defined Function to split a list within a cell and then sort it
' before recreating a sorted list
Public Function CellSort(strString As String) As String
Dim i As Integer
Dim arr As Variant
Dim strRet As String
arr = Split(strString, c_Separator)
' trim values so sort will work properly
For i = LBound(arr) To UBound(arr)
arr(i) = Trim(arr(i))
Next i
' sort
QuickSort arr, LBound(arr), UBound(arr)
' construct ordered list to return
For i = LBound(arr) To UBound(arr) - 1
strRet = strRet & CStr(arr(i)) & c_Separator
Next i
' Attach the last item separately to avoid adding an unecessary separator
CellSort = strRet & CStr(arr(i))
End Function
' Quick Sort function found here:
' https://stackoverflow.com/questions/3399823/excel-how-do-i-sort-within-a-cell
Public Sub QuickSort(vArray As Variant, inLow As Long, inHi As Long)
Dim pivot As Variant
Dim tmpSwap As Variant
Dim tmpLow As Long
Dim tmpHi As Long
tmpLow = inLow
tmpHi = inHi
pivot = vArray((inLow + inHi) \ 2)
While (tmpLow <= tmpHi)
While (vArray(tmpLow) < pivot And tmpLow < inHi)
tmpLow = tmpLow + 1
Wend
While (pivot < vArray(tmpHi) And tmpHi > inLow)
tmpHi = tmpHi - 1
Wend
If (tmpLow <= tmpHi) Then
tmpSwap = vArray(tmpLow)
vArray(tmpLow) = vArray(tmpHi)
vArray(tmpHi) = tmpSwap
tmpLow = tmpLow + 1
tmpHi = tmpHi - 1
End If
Wend
If (inLow < tmpHi) Then QuickSort vArray, inLow, tmpHi
If (tmpLow < inHi) Then QuickSort vArray, tmpLow, inHi
End Sub
Close the VBA editor (no more coding required).
Excel Formulas to Calculate number of Unique combinations In Excel you create three columns alongside your original column of data. I have shown these formulas in the screenshot below in the columns with coloured backgrounds. Each is explained beneath the image.
Blue column: Uses the VBA function above to sort the contents of each cell in the original list, this gives a consistent list that you can count unique items on. If your original list has some instances in lowercase and others in uppercase and you need to treat these as the same then modify the formula in this column to be =CellSort(UPPER(A2))
Green column: Simple COUNTIF
function (works in all recent Excel versions) that identifies the first instance of each sorted cell.
Red cell: Counts the number of times TRUE appears in the green column. This gives a count of unique entries.
And below is an example of the completed work.
Original comments
Afraid I don't have time to test this right now but this may help you.
You might want to look at the VBA given in this answer (I haven't tried it myself).
Then, if you need to do this in formulas, you could create a User Defined Function from this VBA to sort the values in the cell. Change the line arr = Split(ActiveCell.Text, ",")
to have "/"
so it will split your list.
Next use your formula in a column alongside the original data and then use something like the formulas on this page: Count occurrences of values or unique values in a data range to count unique.
Let me know if you need more help with any of the above and I will try to do it tomorrow.
Upvotes: 0
Reputation: 63424
In SAS, split them into component values and then sort the variables.
data have;
length value $50;
input value $;
datalines;
GummyBears/Skittles
Skittles/GummyBears
MandMs/Skittles
GummyBears/MandMs
Skittles/MandMs
GummyBears/MandMs
;;;;
run;
data want;
set have;
length vals1 vals2 $20;
array vals[2] $;
do _t = 1 to dim(vals);
vals[_t]=scan(value,_t,'/');
end;
call sortc(of vals[*]);
run;
proc freq data=want;
tables vals1*vals2/list;
run;
You could also recombine them afterwards if you prefer.
Upvotes: 3
Reputation: 3847
You might try writing some VBA code which can be used in Excel to change the way that A and B are displayed in the column. For example, for B/A, it would put it in alphabetical order and change it to A/B. For A/B, since it is already in alphabetical order, it would leave it as it is. After this is done, then sort the sheet by that column. You should then see data like A/B and the next row down could also have A/B displayed in it. Next, depending on what you need done, you can add some more VBA code to combine the rows by that type of column and write that data out to a new sheet. This would provide you with a unique count of all unique sets of equipment types owned by all customers. If you wanted unique counts of customers who might own A/B/C and combine the A/B portion with other customers who own just A/B, then that would be a little more work.
Upvotes: 0
Reputation: 2258
You could use the conditional formatting to highlight duplicates option. It would still be eye balling it, but you could potentially then sort by color to see matched/unmatched equipment.
An example on how to do this can be found here: http://www.techrepublic.com/blog/window-on-windows/how-to-find-duplicates-in-excel/7347
Upvotes: 0