mysticous
mysticous

Reputation: 177

Expanding column cells for each column cell

I have 3 different sets of data (in different columns)

  1. Animals (5 different kinds) in column A
  2. Fruits (1000 different kinds) in column B
  3. Countries (10 different kinds) in column C

With these 3 data collections I would like to receive 5×1000×10 for a total of 50k corresponding elements in col. E F G (each animal who corresponds with each fruit and each country).

It might be done by manually copying and pasting values, but it will take ages. Is there any way to automate it by VBA code or

Is there any universal formula for unlimited data sets like the one presented above? Please let me know if something is not clear.

Here is a smaller example of data and how the results should turn out:

        Expanding data sets for each in other

Upvotes: 15

Views: 3228

Answers (9)

Hydre
Hydre

Reputation: 1

First you will need to put the data as follows: How to put your data

You will add a new column in which you will add the frequencies together. Do a simple recursive formula. (ex: f3+f4)

To take it to the modern version of Excel and the new function Xlookup, I propose this formula: =XLOOKUP(ROWS(K$2[a]:K2),$I$3:$I$8[b],$H$3:$H$8[c],"All frequencies met",1,1

where:

[a] is the column in which you want the data to be displayed. It is important to lock the number

[b] is the added frequencies together

[c] is the element to show at that frequency

How it works ? :

ROWS(K$2[a]:K2) : will determine the position in your column. At the first cell, it will consider itself at the first position. Next cell, it will be second, and so on.

XLOOKUP part : Once we have the position, we are comparing if the position found in ROWS() is lower than or equal to the first frequency (why whe use the first 1).

If it is, it will display the element associated to that frequency.

If it is bigger than the first frequency, it will check for the second frequency and so on.

If we are further than the max combined frequency it will show "All frequencies met".

The last 1 is unecessary for this function.

Upvotes: 0

user4039065
user4039065

Reputation:

I gather by universal, you want this to accommodate any number of columns and any number of entries in each. A few variant arrays should provide the dimensions necessary to calculate the cycles of repetition for each value.

Option Explicit

Sub main()
    Call for_each_in_others(rDATA:=Worksheets("Sheet3").Range("A3"), bHDR:=True)
End Sub

Sub for_each_in_others(rDATA As Range, Optional bHDR As Boolean = False)
    Dim v As Long, w As Long
    Dim iINCROWS As Long, iMAXROWS As Long, sErrorRng As String
    Dim vVALs As Variant, vTMPs As Variant, vCOLs As Variant

    On Error GoTo bm_Safe_Exit
    appTGGL bTGGL:=False

    With rDATA.Parent
        With rDATA(1).CurrentRegion
            'Debug.Print rDATA(1).Row - .Cells(1).Row
            With .Resize(.Rows.Count - (rDATA(1).Row - .Cells(1).Row), .Columns.Count).Offset(2, 0)
                sErrorRng = .Address(0, 0)
                vTMPs = .Value2
                ReDim vCOLs(LBound(vTMPs, 2) To UBound(vTMPs, 2))
                iMAXROWS = 1
                'On Error GoTo bm_Output_Exceeded
                For w = LBound(vTMPs, 2) To UBound(vTMPs, 2)
                    vCOLs(w) = Application.CountA(.Columns(w))
                    iMAXROWS = iMAXROWS * vCOLs(w)
                Next w

                'control excessive or no rows of output
                If iMAXROWS > Rows.Count Then
                    GoTo bm_Output_Exceeded
                ElseIf .Columns.Count = 1 Or iMAXROWS = 0 Then
                    GoTo bm_Nothing_To_Do
                End If

                On Error GoTo bm_Safe_Exit
                ReDim vVALs(LBound(vTMPs, 1) To iMAXROWS, LBound(vTMPs, 2) To UBound(vTMPs, 2))
                iINCROWS = 1
                For w = LBound(vVALs, 2) To UBound(vVALs, 2)
                    iINCROWS = iINCROWS * vCOLs(w)
                    For v = LBound(vVALs, 1) To UBound(vVALs, 1)
                        vVALs(v, w) = vTMPs((Int(iINCROWS * ((v - 1) / UBound(vVALs, 1))) Mod vCOLs(w)) + 1, w)
                    Next v
                Next w
            End With
        End With
        .Cells(2, UBound(vVALs, 2) + 2).Resize(1, UBound(vVALs, 2) + 2).EntireColumn.Delete
        If bHDR Then
            rDATA.Cells(1, 1).Offset(-1, 0).Resize(1, UBound(vVALs, 2)).Copy _
                Destination:=rDATA.Cells(1, UBound(vVALs, 2) + 2).Offset(-1, 0)
        End If
        rDATA.Cells(1, UBound(vVALs, 2) + 2).Resize(UBound(vVALs, 1), UBound(vVALs, 2)) = vVALs
    End With

    GoTo bm_Safe_Exit
bm_Nothing_To_Do:
    MsgBox "There is not enough data in  " & sErrorRng & " to perform expansion." & Chr(10) & _
           "This could be due to a single column of values or one or more blank column(s) of values." & _
            Chr(10) & Chr(10) & "There is nothing to expand.", vbInformation, _
           "Single or No Column of Raw Data"
    GoTo bm_Safe_Exit
bm_Output_Exceeded:
    MsgBox "The number of expanded values created from " & sErrorRng & _
           " (" & Format(iMAXROWS, "\> #, ##0") & " rows × " & UBound(vTMPs, 2) & _
           " columns) exceeds the rows available (" & Format(Rows.Count, "#, ##0") & ") on this worksheet.", vbCritical, _
           "Too Many Entries"
bm_Safe_Exit:
    appTGGL
End Sub

Sub appTGGL(Optional bTGGL As Boolean = True)
    Application.EnableEvents = bTGGL
    Application.ScreenUpdating = bTGGL
End Sub

Put the column header labels in row 2 starting in column A and the data directly below that.

I have added some error control to warn of exceeding the number of rows on a worksheet. This is not normally something that is likely to be a consideration but multiplying the number of values in an undetermined number of columns against each other can quickly produce a large number of results. It is not unforeseeable that you would exceed 1,048,576 rows.

        Variant Array expansion

Upvotes: 15

BrakNicku
BrakNicku

Reputation: 5991

My first approach to this problem was similar to the one posted by @Jeeped:

  1. load input columns to array and count rows in each column
  2. fill array with all combinations
  3. assign array to output range

Using MicroTimer I have calculated average times taken by each part of the above algorithm. Part 3. took 90%-93% of total execution time for bigger input data.

Below is my attempt to improve the speed of writing data to worksheet. I have defined a constant iMinRSize=17. Once it is possible to fill more than iMinRSize consecutive rows with the same value, the code stops filiing array and writes directly to worksheet range.

Sub CrossJoin(rSrc As Range, rTrg As Range)

  Dim vSrc() As Variant, vTrgPart() As Variant
  Dim iLengths() As Long
  Dim iCCnt As Integer, iRTrgCnt As Long, iRSrcCnt As Long
  Dim i As Integer, j As Long, k As Long, l As Long
  Dim iStep As Long

  Const iMinRSize As Long = 17
  Dim iArrLastC As Integer

  On Error GoTo CleanUp
  Application.ScreenUpdating = False
  Application.EnableEvents = False

  vSrc = rSrc.Value2
  iCCnt = UBound(vSrc, 2)
  iRSrcCnt = UBound(vSrc, 1)
  iRTrgCnt = 1
  iArrLastC = 1
  ReDim iLengths(1 To iCCnt)
  For i = 1 To iCCnt
    j = iRSrcCnt
    While (j > 0) And IsEmpty(vSrc(j, i))
      j = j - 1
    Wend
    iLengths(i) = j
    iRTrgCnt = iRTrgCnt * iLengths(i)
    If (iRTrgCnt < iMinRSize) And (iArrLastC < iCCnt) Then iArrLastC = iArrLastC + 1
  Next i

  If (iRTrgCnt > 0) And (rTrg.row + iRTrgCnt - 1 <= rTrg.Parent.Rows.Count) Then
    ReDim vTrgPart(1 To iRTrgCnt, 1 To iArrLastC)

    iStep = 1
    For i = 1 To iArrLastC
      k = 0
      For j = 1 To iRTrgCnt Step iStep
        k = k + 1
        If k > iLengths(i) Then k = 1
        For l = j To j + iStep - 1
          vTrgPart(l, i) = vSrc(k, i)
        Next l
      Next j
      iStep = iStep * iLengths(i)
    Next i

    rTrg.Resize(iRTrgCnt, iArrLastC) = vTrgPart

    For i = iArrLastC + 1 To iCCnt
      k = 0
      For j = 1 To iRTrgCnt Step iStep
        k = k + 1
        If k > iLengths(i) Then k = 1
        rTrg.Resize(iStep).Offset(j - 1, i - 1).Value2 = vSrc(k, i)
      Next j
      iStep = iStep * iLengths(i)
    Next i
  End If

CleanUp:
  Application.ScreenUpdating = True
  Application.EnableEvents = False
End Sub

Sub test()
  CrossJoin Range("a2:f10"), Range("k2")
End Sub

If we set iMinRSize to Rows.Count, all data is written to array. Below are my sample test results:

enter image description here

The code works best if input columns with highest number of rows come first, but it wouldn't be a big problem to modify code to rank columns and process in right order.

Upvotes: 12

R.Katnaan
R.Katnaan

Reputation: 2526

Actually, I want to modify my old answer. But, my new answer is fully differ from old answer. Because, old answer is for specific column and this one is for universal column. After answering the old answer, the questioner say new requirement which he want to do it in universal. For fixed column, we can think fixed looping and for infinite column, we need to think from another way. So, I also do it. And SO users also can see the code differences and I think, this will be helpful for beginners.

This new code is not so simple like the old one. If you want to know clearly about code, I suggested for debug the code in line by line.

Don't worry about the code. I already tested about it in step by step. It perfectly work for me. If it is not for you, please let me know. One things is that this code can cause error for blank row(which has no data). Because, currently, I not added checking for that.

Here is my universal approach for your problem:

Public Sub matchingCell()

    Dim startRawColumn, endRawColumn, startResultColumn, endResultColumn, startRow As Integer
    Dim index, row, column, containerIndex, tempIndex As Integer
    Dim columnCount, totalCount, timesCount, matchingCount, tempCount As Integer
    Dim isExist As Boolean
    Dim arrayContainer() As Variant

    'Actually, even it is for universal, we need to know start column and end column of raw data.
    'And also start row. And start column for write result.
    'I set them for my test data.
    'You need to modify them(startRawColumn, endRawColumn, startRow, startResultColumn).

    'Set the start column and end column for raw data
    startRawColumn = 1
    endRawColumn = 3

    'Set the start row for read data and write data
    startRow = 2

    'Set the start column for result data
    startResultColumn = 4

    'Get no of raw data column
    columnCount = endRawColumn - startRawColumn

    'Set container index
    containerIndex = 0

    'Re-create array container for count of column
    ReDim arrayContainer(0 To columnCount)

    With Sheets("sheetname")

        'Getting data from sheet

        'Loop all column for getting data of each column
        For column = startRawColumn To endRawColumn Step 1

            'Create tempArray for column
            Dim tempArray() As Variant

            'Reset startRow
            row = startRow

            'Reset index
            index = 0

            'Here is one things. I looped until to blank. 
            'If you want anymore, you can modify the looping type. 
            'Don't do any changes to main body of looping.

            'Loop until the cell is blank
            Do While .Cells(row, column) <> ""

                'Reset isExist flag
                isExist = False

                'Remove checking for no data
                If index > 0 Then

                    'Loop previous data for duplicate checking
                    For tempIndex = 0 To index - 1 Step 1

                        'If found, set true to isExist and stop loop
                        If tempArray(tempIndex) = .Cells(row, column) Then

                            isExist = True

                            Exit For

                        End If

                    Next tempIndex

                End If

                'If there is no duplicate data, store data
                If Not isExist Then

                    'Reset tempArray
                    ReDim Preserve tempArray(index)

                    tempArray(index) = .Cells(row, column)

                    'Increase index
                    index = index + 1

                End If

                'Increase row
                row = row + 1

            Loop

            'Store column with data
            arrayContainer(containerIndex) = tempArray

            'Increase container index
            containerIndex = containerIndex + 1

        Next column

        'Now, we got all data column including data which has no duplicate
        'Show result data on sheet

        'Getting the result row count
        totalCount = 1

        'Get result row count
        For tempIndex = 0 To UBound(arrayContainer) Step 1

            totalCount = totalCount * (UBound(arrayContainer(tempIndex)) + 1)

        Next tempIndex

        'Reset timesCount
        timesCount = 1

        'Get the last column for result
        endResultColumn = startResultColumn + columnCount

        'Loop array container
        For containerIndex = UBound(arrayContainer) To 0 Step -1

            'Getting the counts for looping
            If containerIndex = UBound(arrayContainer) Then

                duplicateCount = 1

                timesCount = totalCount / (UBound(arrayContainer(containerIndex)) + 1)

            Else

                duplicateCount = duplicateCount * (UBound(arrayContainer(containerIndex + 1)) + 1)

                timesCount = timesCount / (UBound(arrayContainer(containerIndex)) + 1)

            End If

            'Reset the start row
            row = startRow

            'Loop timesCount
            For countIndex = 1 To timesCount Step 1

                'Loop data array
                For index = 0 To UBound(arrayContainer(containerIndex)) Step 1

                    'Loop duplicateCount
                    For tempIndex = 1 To duplicateCount Step 1

                        'Write data to cell
                        .Cells(row, endResultColumn) = arrayContainer(containerIndex)(index)

                        'Increase row
                        row = row + 1

                    Next tempIndex

                Next index

            Next countIndex

            'Increase result column index
            endResultColumn = endResultColumn - 1

        Next containerIndex

    End With

End Sub

Upvotes: 4

Parfait
Parfait

Reputation: 107652

Classic example of a non-join select SQL statement which returns the Cartesian Product of all combination outcomes of listed tables.

SQL Database Solution

Simply import Animals, Fruit, Country as separate tables into any SQL database like MS Access, SQLite, MySQL, etc. and list tables without joins including implicit (WHERE) and explicit (JOIN) joins:

SELECT Animals.Animal, Fruits.Fruit, Countries.Country
FROM Animals, Countries, Fruits;

Cartesian SQL

Excel Solution

Same concept with running the non-join SQL statement in VBA using an ODBC connection to workbook containing ranges of Animals, Countries, and Fruits. In example, each data grouping is in its own worksheet of same name.

Sub CrossJoinQuery()

    Dim conn As Object
    Dim rst As Object
    Dim sConn As String, strSQL As String

    Set conn = CreateObject("ADODB.Connection")
    Set rst = CreateObject("ADODB.Recordset")

    sConn = "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
               & "DBQ=C:\Path To\Excel\Workbook.xlsx;"
    conn.Open sConn

    strSQL = "SELECT * FROM [Animals$A1:A3], [Fruits$A1:A3], [Countries$A1:A3] "
    rst.Open strSQL, conn

    Range("A1").CopyFromRecordset rst

    rst.Close
    conn.Close

    Set rst = Nothing
    Set conn = Nothing

End Sub

Cartesian SQL in VBA

Upvotes: 14

John Coleman
John Coleman

Reputation: 51998

Here is a recursive version. It assumes that the data doesn't contain any internal tabs since the core function returns product strings which are tab-delimited. The main sub needs to be passed a range consisting of the data together with the upper left-hand corner cell of the output range. This could probably be tweaked a bit but is adequate for testing purposes.

ColumnProducts Range("A:C"), Range("E1")

Is the call that solves the OP problem. Here is the code:

'the following function takes a collection of arrays of strings
'and returns a variant array of tab-delimited strings which
'comprise the (tab-delimited) cartesian products of
'the arrays in the collection

Function CartesianProduct(ByVal Arrays As Collection) As Variant
    Dim i As Long, j As Long, k As Long, m As Long, n As Long
    Dim head As Variant
    Dim tail As Variant
    Dim product As Variant

    If Arrays.Count = 1 Then
        CartesianProduct = Arrays.Item(1)
        Exit Function
    Else
        head = Arrays.Item(1)
        Arrays.Remove 1
        tail = CartesianProduct(Arrays)
        m = UBound(head)
        n = UBound(tail)
        ReDim product(1 To m * n)
        k = 1
        For i = 1 To m
            For j = 1 To n
                product(k) = head(i) & vbTab & tail(j)
                k = k + 1
            Next j
        Next i
        CartesianProduct = product
    End If
End Function

Sub ColumnProducts(data As Range, output As Range)
    Dim Arrays As New Collection
    Dim strings As Variant, product As Variant
    Dim i As Long, j As Long, n As Long, numRows As Long
    Dim col As Range, cell As Range
    Dim outRange As Range

    numRows = Range("A:A").Rows.Count
    For Each col In data.Columns
        n = col.EntireColumn.Cells(numRows).End(xlUp).Row
        i = col.Cells(1).Row
        ReDim strings(1 To n - i + 1)
        For j = 1 To n - i + 1
            strings(j) = col.Cells(i + j - 1)
        Next j
        Arrays.Add strings
    Next col
    product = CartesianProduct(Arrays)
    n = UBound(product)
    Set outRange = Range(output, output.Offset(n - 1))
    outRange.Value = Application.WorksheetFunction.Transpose(product)
    outRange.TextToColumns Destination:=output, DataType:=xlDelimited, Tab:=True
End Sub

Upvotes: 2

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

You can do this with worksheet formulas. If you have NAME'd ranges -- Animals, Fruits and Countries, the "trick" is to generate indexes into that array to provide all the various combinations.

For example:

=CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)

will generate a 1-based series of numbers that repeats for the number entries in Fruits * Countries -- which gives you how many rows you need for each animal.

=MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1

will generate a 1-based series that repeats each Fruit for the number of countries.

=MOD(ROWS($1:1)-1,ROWS(Countries))+1))

Generates a repeating sequence of 1..n where n is the number of countries.

Putting these into formulas (with some error checking)

D3:  =IFERROR(INDEX(Animals,CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)),"")
E3:  =IF(E3="","",INDEX(Fruits,MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1))
F3:  =IF(E3="","",INDEX(Countries,MOD(ROWS($1:1)-1,ROWS(Countries))+1))

enter image description here

Upvotes: 7

R.Katnaan
R.Katnaan

Reputation: 2526

Here, my approach for your problem.

Public Sub matchingCell()

    Dim animalRow, fruitRow, countryRow, checkRow, resultRow As Long
    Dim isExist As Boolean

    'Set the start row
    animalRow = 2
    resultRow = 2

    'Work with data sheet
    With Sheets("sheetname")

        'Loop until animals column is blank
        Do While .Range("A" & animalRow) <> ""

            'Set the start row
            fruitRow = 2

            'Loop until fruits column is blank
            Do While .Range("B" & fruitRow) <> ""

                'Set the start row
                countryRow = 2

                'Loop until country column is blank
                Do While .Range("C" & countryRow) <> ""

                    'Set the start row
                    checkRow = 2

                    'Reset flag
                    isExist = False

                    'Checking for duplicate row
                    'Loop all result row until D is blank
                    Do While .Range("D" & checkRow) <> ""

                        'If duplicate row found
                        If .Range("D" & checkRow) = .Range("A" & animalRow) And _
                           .Range("E" & checkRow) = .Range("B" & fruitRow) And _
                           .Range("F" & checkRow) = .Range("C" & countryRow) Then

                           'Set true for exist flag
                           isExist = True

                        End If

                        checkRow = checkRow + 1

                    Loop

                    'If duplicate row not found
                    If Not isExist Then

                        .Range("D" & resultRow) = .Range("A" & animalRow)
                        .Range("E" & resultRow) = .Range("B" & fruitRow)
                        .Range("F" & resultRow) = .Range("C" & countryRow)

                        'Increase resultRow
                        resultRow = resultRow + 1

                    End If

                    'Increase countryRow
                    countryRow = countryRow + 1

                Loop

                'Increase fruitRow
                fruitRow = fruitRow + 1

            Loop

            'Increase fruitRow
            animalRow = animalRow + 1

        Loop

    End With

End Sub

I already tested it. It work well. Have a nice day.

Upvotes: 1

Luuklag
Luuklag

Reputation: 3914

Okay, so you just want a list of all possible combinations. Here is what I would do:

  • First select the raw data and remove duplicates, column by column.
  • Then read these 3 columns into 3 separate arrays.
  • Calculate the total length of all arrays.
  • Then with a loop paste the first value of the country array as many times as there are combinations of animals and fruits, so the length of those arrays multiplied.
  • Within the loop make another loop that posts all options of fruits. With a number of duplicate rows that is equal to the maximum number of animals.
  • Then paste the animals without duplicates following each other till the last row of the table.

Upvotes: 1

Related Questions