Reputation: 177
I have 3 different sets of data (in different columns)
With these 3 data collections I would like to receive 5×1000×10 for a total of 50k corresponding elements in col. E F G (each animal who corresponds with each fruit and each country).
It might be done by manually copying and pasting values, but it will take ages. Is there any way to automate it by VBA code or
Is there any universal formula for unlimited data sets like the one presented above? Please let me know if something is not clear.
Here is a smaller example of data and how the results should turn out:
Upvotes: 15
Views: 3228
Reputation: 1
First you will need to put the data as follows: How to put your data
You will add a new column in which you will add the frequencies together. Do a simple recursive formula. (ex: f3+f4)
To take it to the modern version of Excel and the new function Xlookup, I propose this formula: =XLOOKUP(ROWS(K$2[a]:K2),$I$3:$I$8[b],$H$3:$H$8[c],"All frequencies met",1,1
where:
[a] is the column in which you want the data to be displayed. It is important to lock the number
[b] is the added frequencies together
[c] is the element to show at that frequency
How it works ? :
ROWS(K$2[a]:K2) : will determine the position in your column. At the first cell, it will consider itself at the first position. Next cell, it will be second, and so on.
XLOOKUP part : Once we have the position, we are comparing if the position found in ROWS() is lower than or equal to the first frequency (why whe use the first 1).
If it is, it will display the element associated to that frequency.
If it is bigger than the first frequency, it will check for the second frequency and so on.
If we are further than the max combined frequency it will show "All frequencies met".
The last 1 is unecessary for this function.
Upvotes: 0
Reputation:
I gather by universal, you want this to accommodate any number of columns and any number of entries in each. A few variant arrays should provide the dimensions necessary to calculate the cycles of repetition for each value.
Option Explicit
Sub main()
Call for_each_in_others(rDATA:=Worksheets("Sheet3").Range("A3"), bHDR:=True)
End Sub
Sub for_each_in_others(rDATA As Range, Optional bHDR As Boolean = False)
Dim v As Long, w As Long
Dim iINCROWS As Long, iMAXROWS As Long, sErrorRng As String
Dim vVALs As Variant, vTMPs As Variant, vCOLs As Variant
On Error GoTo bm_Safe_Exit
appTGGL bTGGL:=False
With rDATA.Parent
With rDATA(1).CurrentRegion
'Debug.Print rDATA(1).Row - .Cells(1).Row
With .Resize(.Rows.Count - (rDATA(1).Row - .Cells(1).Row), .Columns.Count).Offset(2, 0)
sErrorRng = .Address(0, 0)
vTMPs = .Value2
ReDim vCOLs(LBound(vTMPs, 2) To UBound(vTMPs, 2))
iMAXROWS = 1
'On Error GoTo bm_Output_Exceeded
For w = LBound(vTMPs, 2) To UBound(vTMPs, 2)
vCOLs(w) = Application.CountA(.Columns(w))
iMAXROWS = iMAXROWS * vCOLs(w)
Next w
'control excessive or no rows of output
If iMAXROWS > Rows.Count Then
GoTo bm_Output_Exceeded
ElseIf .Columns.Count = 1 Or iMAXROWS = 0 Then
GoTo bm_Nothing_To_Do
End If
On Error GoTo bm_Safe_Exit
ReDim vVALs(LBound(vTMPs, 1) To iMAXROWS, LBound(vTMPs, 2) To UBound(vTMPs, 2))
iINCROWS = 1
For w = LBound(vVALs, 2) To UBound(vVALs, 2)
iINCROWS = iINCROWS * vCOLs(w)
For v = LBound(vVALs, 1) To UBound(vVALs, 1)
vVALs(v, w) = vTMPs((Int(iINCROWS * ((v - 1) / UBound(vVALs, 1))) Mod vCOLs(w)) + 1, w)
Next v
Next w
End With
End With
.Cells(2, UBound(vVALs, 2) + 2).Resize(1, UBound(vVALs, 2) + 2).EntireColumn.Delete
If bHDR Then
rDATA.Cells(1, 1).Offset(-1, 0).Resize(1, UBound(vVALs, 2)).Copy _
Destination:=rDATA.Cells(1, UBound(vVALs, 2) + 2).Offset(-1, 0)
End If
rDATA.Cells(1, UBound(vVALs, 2) + 2).Resize(UBound(vVALs, 1), UBound(vVALs, 2)) = vVALs
End With
GoTo bm_Safe_Exit
bm_Nothing_To_Do:
MsgBox "There is not enough data in " & sErrorRng & " to perform expansion." & Chr(10) & _
"This could be due to a single column of values or one or more blank column(s) of values." & _
Chr(10) & Chr(10) & "There is nothing to expand.", vbInformation, _
"Single or No Column of Raw Data"
GoTo bm_Safe_Exit
bm_Output_Exceeded:
MsgBox "The number of expanded values created from " & sErrorRng & _
" (" & Format(iMAXROWS, "\> #, ##0") & " rows × " & UBound(vTMPs, 2) & _
" columns) exceeds the rows available (" & Format(Rows.Count, "#, ##0") & ") on this worksheet.", vbCritical, _
"Too Many Entries"
bm_Safe_Exit:
appTGGL
End Sub
Sub appTGGL(Optional bTGGL As Boolean = True)
Application.EnableEvents = bTGGL
Application.ScreenUpdating = bTGGL
End Sub
Put the column header labels in row 2 starting in column A and the data directly below that.
I have added some error control to warn of exceeding the number of rows on a worksheet. This is not normally something that is likely to be a consideration but multiplying the number of values in an undetermined number of columns against each other can quickly produce a large number of results. It is not unforeseeable that you would exceed 1,048,576 rows.
Upvotes: 15
Reputation: 5991
My first approach to this problem was similar to the one posted by @Jeeped:
Using MicroTimer I have calculated average times taken by each part of the above algorithm. Part 3. took 90%-93% of total execution time for bigger input data.
Below is my attempt to improve the speed of writing data to worksheet. I have defined a constant iMinRSize=17
. Once it is possible to fill more than iMinRSize
consecutive rows with the same value, the code stops filiing array and writes directly to worksheet range.
Sub CrossJoin(rSrc As Range, rTrg As Range)
Dim vSrc() As Variant, vTrgPart() As Variant
Dim iLengths() As Long
Dim iCCnt As Integer, iRTrgCnt As Long, iRSrcCnt As Long
Dim i As Integer, j As Long, k As Long, l As Long
Dim iStep As Long
Const iMinRSize As Long = 17
Dim iArrLastC As Integer
On Error GoTo CleanUp
Application.ScreenUpdating = False
Application.EnableEvents = False
vSrc = rSrc.Value2
iCCnt = UBound(vSrc, 2)
iRSrcCnt = UBound(vSrc, 1)
iRTrgCnt = 1
iArrLastC = 1
ReDim iLengths(1 To iCCnt)
For i = 1 To iCCnt
j = iRSrcCnt
While (j > 0) And IsEmpty(vSrc(j, i))
j = j - 1
Wend
iLengths(i) = j
iRTrgCnt = iRTrgCnt * iLengths(i)
If (iRTrgCnt < iMinRSize) And (iArrLastC < iCCnt) Then iArrLastC = iArrLastC + 1
Next i
If (iRTrgCnt > 0) And (rTrg.row + iRTrgCnt - 1 <= rTrg.Parent.Rows.Count) Then
ReDim vTrgPart(1 To iRTrgCnt, 1 To iArrLastC)
iStep = 1
For i = 1 To iArrLastC
k = 0
For j = 1 To iRTrgCnt Step iStep
k = k + 1
If k > iLengths(i) Then k = 1
For l = j To j + iStep - 1
vTrgPart(l, i) = vSrc(k, i)
Next l
Next j
iStep = iStep * iLengths(i)
Next i
rTrg.Resize(iRTrgCnt, iArrLastC) = vTrgPart
For i = iArrLastC + 1 To iCCnt
k = 0
For j = 1 To iRTrgCnt Step iStep
k = k + 1
If k > iLengths(i) Then k = 1
rTrg.Resize(iStep).Offset(j - 1, i - 1).Value2 = vSrc(k, i)
Next j
iStep = iStep * iLengths(i)
Next i
End If
CleanUp:
Application.ScreenUpdating = True
Application.EnableEvents = False
End Sub
Sub test()
CrossJoin Range("a2:f10"), Range("k2")
End Sub
If we set iMinRSize
to Rows.Count
, all data is written to array. Below are my sample test results:
The code works best if input columns with highest number of rows come first, but it wouldn't be a big problem to modify code to rank columns and process in right order.
Upvotes: 12
Reputation: 2526
Actually, I want to modify my old answer. But, my new answer is fully differ from old answer. Because, old answer is for specific column and this one is for universal column. After answering the old answer, the questioner say new requirement which he want to do it in universal. For fixed column, we can think fixed looping and for infinite column, we need to think from another way. So, I also do it. And SO users also can see the code differences and I think, this will be helpful for beginners.
This new code is not so simple like the old one. If you want to know clearly about code, I suggested for debug the code in line by line.
Don't worry about the code. I already tested about it in step by step. It perfectly work for me. If it is not for you, please let me know. One things is that this code can cause error for blank row(which has no data). Because, currently, I not added checking for that.
Here is my universal approach for your problem:
Public Sub matchingCell()
Dim startRawColumn, endRawColumn, startResultColumn, endResultColumn, startRow As Integer
Dim index, row, column, containerIndex, tempIndex As Integer
Dim columnCount, totalCount, timesCount, matchingCount, tempCount As Integer
Dim isExist As Boolean
Dim arrayContainer() As Variant
'Actually, even it is for universal, we need to know start column and end column of raw data.
'And also start row. And start column for write result.
'I set them for my test data.
'You need to modify them(startRawColumn, endRawColumn, startRow, startResultColumn).
'Set the start column and end column for raw data
startRawColumn = 1
endRawColumn = 3
'Set the start row for read data and write data
startRow = 2
'Set the start column for result data
startResultColumn = 4
'Get no of raw data column
columnCount = endRawColumn - startRawColumn
'Set container index
containerIndex = 0
'Re-create array container for count of column
ReDim arrayContainer(0 To columnCount)
With Sheets("sheetname")
'Getting data from sheet
'Loop all column for getting data of each column
For column = startRawColumn To endRawColumn Step 1
'Create tempArray for column
Dim tempArray() As Variant
'Reset startRow
row = startRow
'Reset index
index = 0
'Here is one things. I looped until to blank.
'If you want anymore, you can modify the looping type.
'Don't do any changes to main body of looping.
'Loop until the cell is blank
Do While .Cells(row, column) <> ""
'Reset isExist flag
isExist = False
'Remove checking for no data
If index > 0 Then
'Loop previous data for duplicate checking
For tempIndex = 0 To index - 1 Step 1
'If found, set true to isExist and stop loop
If tempArray(tempIndex) = .Cells(row, column) Then
isExist = True
Exit For
End If
Next tempIndex
End If
'If there is no duplicate data, store data
If Not isExist Then
'Reset tempArray
ReDim Preserve tempArray(index)
tempArray(index) = .Cells(row, column)
'Increase index
index = index + 1
End If
'Increase row
row = row + 1
Loop
'Store column with data
arrayContainer(containerIndex) = tempArray
'Increase container index
containerIndex = containerIndex + 1
Next column
'Now, we got all data column including data which has no duplicate
'Show result data on sheet
'Getting the result row count
totalCount = 1
'Get result row count
For tempIndex = 0 To UBound(arrayContainer) Step 1
totalCount = totalCount * (UBound(arrayContainer(tempIndex)) + 1)
Next tempIndex
'Reset timesCount
timesCount = 1
'Get the last column for result
endResultColumn = startResultColumn + columnCount
'Loop array container
For containerIndex = UBound(arrayContainer) To 0 Step -1
'Getting the counts for looping
If containerIndex = UBound(arrayContainer) Then
duplicateCount = 1
timesCount = totalCount / (UBound(arrayContainer(containerIndex)) + 1)
Else
duplicateCount = duplicateCount * (UBound(arrayContainer(containerIndex + 1)) + 1)
timesCount = timesCount / (UBound(arrayContainer(containerIndex)) + 1)
End If
'Reset the start row
row = startRow
'Loop timesCount
For countIndex = 1 To timesCount Step 1
'Loop data array
For index = 0 To UBound(arrayContainer(containerIndex)) Step 1
'Loop duplicateCount
For tempIndex = 1 To duplicateCount Step 1
'Write data to cell
.Cells(row, endResultColumn) = arrayContainer(containerIndex)(index)
'Increase row
row = row + 1
Next tempIndex
Next index
Next countIndex
'Increase result column index
endResultColumn = endResultColumn - 1
Next containerIndex
End With
End Sub
Upvotes: 4
Reputation: 107652
Classic example of a non-join select SQL statement which returns the Cartesian Product of all combination outcomes of listed tables.
SQL Database Solution
Simply import Animals, Fruit, Country as separate tables into any SQL database like MS Access, SQLite, MySQL, etc. and list tables without joins including implicit (WHERE
) and explicit (JOIN
) joins:
SELECT Animals.Animal, Fruits.Fruit, Countries.Country
FROM Animals, Countries, Fruits;
Excel Solution
Same concept with running the non-join SQL statement in VBA using an ODBC connection to workbook containing ranges of Animals, Countries, and Fruits. In example, each data grouping is in its own worksheet of same name.
Sub CrossJoinQuery()
Dim conn As Object
Dim rst As Object
Dim sConn As String, strSQL As String
Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
sConn = "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
& "DBQ=C:\Path To\Excel\Workbook.xlsx;"
conn.Open sConn
strSQL = "SELECT * FROM [Animals$A1:A3], [Fruits$A1:A3], [Countries$A1:A3] "
rst.Open strSQL, conn
Range("A1").CopyFromRecordset rst
rst.Close
conn.Close
Set rst = Nothing
Set conn = Nothing
End Sub
Upvotes: 14
Reputation: 51998
Here is a recursive version. It assumes that the data doesn't contain any internal tabs since the core function returns product strings which are tab-delimited. The main sub needs to be passed a range consisting of the data together with the upper left-hand corner cell of the output range. This could probably be tweaked a bit but is adequate for testing purposes.
ColumnProducts Range("A:C"), Range("E1")
Is the call that solves the OP problem. Here is the code:
'the following function takes a collection of arrays of strings
'and returns a variant array of tab-delimited strings which
'comprise the (tab-delimited) cartesian products of
'the arrays in the collection
Function CartesianProduct(ByVal Arrays As Collection) As Variant
Dim i As Long, j As Long, k As Long, m As Long, n As Long
Dim head As Variant
Dim tail As Variant
Dim product As Variant
If Arrays.Count = 1 Then
CartesianProduct = Arrays.Item(1)
Exit Function
Else
head = Arrays.Item(1)
Arrays.Remove 1
tail = CartesianProduct(Arrays)
m = UBound(head)
n = UBound(tail)
ReDim product(1 To m * n)
k = 1
For i = 1 To m
For j = 1 To n
product(k) = head(i) & vbTab & tail(j)
k = k + 1
Next j
Next i
CartesianProduct = product
End If
End Function
Sub ColumnProducts(data As Range, output As Range)
Dim Arrays As New Collection
Dim strings As Variant, product As Variant
Dim i As Long, j As Long, n As Long, numRows As Long
Dim col As Range, cell As Range
Dim outRange As Range
numRows = Range("A:A").Rows.Count
For Each col In data.Columns
n = col.EntireColumn.Cells(numRows).End(xlUp).Row
i = col.Cells(1).Row
ReDim strings(1 To n - i + 1)
For j = 1 To n - i + 1
strings(j) = col.Cells(i + j - 1)
Next j
Arrays.Add strings
Next col
product = CartesianProduct(Arrays)
n = UBound(product)
Set outRange = Range(output, output.Offset(n - 1))
outRange.Value = Application.WorksheetFunction.Transpose(product)
outRange.TextToColumns Destination:=output, DataType:=xlDelimited, Tab:=True
End Sub
Upvotes: 2
Reputation: 60224
You can do this with worksheet formulas. If you have NAME'd ranges -- Animals, Fruits and Countries, the "trick" is to generate indexes into that array to provide all the various combinations.
For example:
=CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)
will generate a 1-based series of numbers that repeats for the number entries in Fruits * Countries -- which gives you how many rows you need for each animal.
=MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1
will generate a 1-based series that repeats each Fruit for the number of countries.
=MOD(ROWS($1:1)-1,ROWS(Countries))+1))
Generates a repeating sequence of 1..n where n is the number of countries.
Putting these into formulas (with some error checking)
D3: =IFERROR(INDEX(Animals,CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)),"")
E3: =IF(E3="","",INDEX(Fruits,MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1))
F3: =IF(E3="","",INDEX(Countries,MOD(ROWS($1:1)-1,ROWS(Countries))+1))
Upvotes: 7
Reputation: 2526
Here, my approach for your problem.
Public Sub matchingCell()
Dim animalRow, fruitRow, countryRow, checkRow, resultRow As Long
Dim isExist As Boolean
'Set the start row
animalRow = 2
resultRow = 2
'Work with data sheet
With Sheets("sheetname")
'Loop until animals column is blank
Do While .Range("A" & animalRow) <> ""
'Set the start row
fruitRow = 2
'Loop until fruits column is blank
Do While .Range("B" & fruitRow) <> ""
'Set the start row
countryRow = 2
'Loop until country column is blank
Do While .Range("C" & countryRow) <> ""
'Set the start row
checkRow = 2
'Reset flag
isExist = False
'Checking for duplicate row
'Loop all result row until D is blank
Do While .Range("D" & checkRow) <> ""
'If duplicate row found
If .Range("D" & checkRow) = .Range("A" & animalRow) And _
.Range("E" & checkRow) = .Range("B" & fruitRow) And _
.Range("F" & checkRow) = .Range("C" & countryRow) Then
'Set true for exist flag
isExist = True
End If
checkRow = checkRow + 1
Loop
'If duplicate row not found
If Not isExist Then
.Range("D" & resultRow) = .Range("A" & animalRow)
.Range("E" & resultRow) = .Range("B" & fruitRow)
.Range("F" & resultRow) = .Range("C" & countryRow)
'Increase resultRow
resultRow = resultRow + 1
End If
'Increase countryRow
countryRow = countryRow + 1
Loop
'Increase fruitRow
fruitRow = fruitRow + 1
Loop
'Increase fruitRow
animalRow = animalRow + 1
Loop
End With
End Sub
I already tested it. It work well. Have a nice day.
Upvotes: 1
Reputation: 3914
Okay, so you just want a list of all possible combinations. Here is what I would do:
Upvotes: 1