Raghavakrishna
Raghavakrishna

Reputation: 101

VBA excel column matching

I have an excel sheet with values like this:

R1:A 1 0 1 1 0 1
R2:B 0 0 1 1 0 0
R3:C 1 0 1 1 0 1
R4:D 1 0 1 1 0 1
R5:E 0 0 1 1 0 0
R-row

OUTPUT:
A,C,D(since they have matching columns)
B,E

I need a VBA script that groups the columns based on matching values.I need to run this on a large set of data(say 417 columns) and n rows ,please be more generalised. Please help me out.Thanks in advance.

Upvotes: 0

Views: 2350

Answers (1)

Tony Dallimore
Tony Dallimore

Reputation: 12413

Chris is absolutely right; questions that sound like "Please solve my entire problem" are not well received here.

I assume you know little or no VBA and do not know where to start with this problem. If you enter "Excel VBA tutorial" in your favourite search engine, you will be offered a selection of tutorials. Try a few, pick the one you like best and work through it systematically. You will be surprised how quickly you build up a good understanding.

However, to give you a start, I will build a tutorial around your problem. I will not say much about individual statements since it is easy to look them up with VB Help or a search engine. For example, the first statement is Option Explicit. Type "excel vba option explicit" into a search engine and you will be given a selection of pages which explain what this statement does and why it is a good idea to include it.

I assume you know how to open Excel, open the VB Editor, create a module and execute macros. If not, these will be the first things any tutorial on the internet will explain.

I have created a workbook with worksheet Input. I have loaded Input with the following data:

Sample data

Rows 2 to 6 match your data. I have added a header row and some more data rows of different lengths. You ask for a generalised solution but I do not know how generalised. This may be more or less than you seek. Create a similar worksheet or modify the code below to your requirements.

Copy the following macro, Test1, to a VB module and run it.

Option Explicit
Sub Test1()

  Dim ColMax As Long
  Dim RowMax As Long

  With Worksheets("Input")

    ' There are many different ways of identifying the last used row and
    ' column.  SpecialCells has a selection of parameters and is worth
    ' knowing so I have decided to use it to identify the last row and
    ' column.

    ColMax = .Cells.SpecialCells(xlCellTypeLastCell).Column
    RowMax = .Cells.SpecialCells(xlCellTypeLastCell).Row

    ' Debug.Print outputs values to the Immediate Window which will be at the
    ' bottom of the VB Editor window.  If the Immediate Window is is missing,
    ' click Ctrl+G.

    Debug.Print "Last used column " & ColMax
    Debug.Print "Last used row " & RowMax

  End With

End Sub

With my data, the macro output the following to the Immediate Window:

Last used column 10
Last used row 13

Column "J" is the 10th column. This code identifies the last row and column used which I must know if my macro is to examine the correct number of rows and columns. Outputting values to the Immediate Window is an easy way of checking your code.

Now add macro Test2 to the module and run it:

Sub Test2()

  Dim ColCrnt As Long
  Dim ColMax As Long

  With Worksheets("Input")

    ColMax = .Cells.SpecialCells(xlCellTypeLastCell).Column

    Debug.Print "Row 1:"
    For ColCrnt = 1 To ColMax
      Debug.Print "Col " & ColCrnt & "=" & .Cells(1, ColCrnt).Value & "  ";
      If ColCrnt Mod 5 = 0 Then
        Debug.Print
      End If
    Next

  End With

End Sub

With my data, the macro output the following to the Immediate Window:

Row 1:
Col 1=Id  Col 2=Value 1  Col 3=Value 2  Col 4=Value 3  Col 5=Value 4  
Col 6=Value 5  Col 7=Value 6  Col 8=Value 7  Col 9=Value 8  Col 10=Value 9  

I will still expect you to use VB Help or the internet to get a description of the statements I have used but some explanation of what I am doing is necessary.

Consider:

ColMax = Worksheets("Input").Cells.SpecialCells(xlCellTypeLastCell).Column

In macro Test2 I had separated Worksheets("Input") from the rest of the string into a With Statement. This makes the code faster, clearer and smaller but I could have written a single string as here.

Worksheets("Input") references the entire worksheet.

Worksheets("Input").X references part X of the worksheet. I could have referenced the charts or the default values but I wanted to reference the cells. Worksheets("Input").Cells references all cells within the worksheet.

Worksheets("Input").Cells.X references part of the cells or a method that operates on cells. Worksheets("Input").Cells.Sort, for example, would allow me to sort the worksheets.

Worksheets("Input").Cells.SpecialCells gives me access to one of a set of methods which return information about Worksheets("Input").Cells. Adding a parameter to give: Worksheets("Input").Cells.SpecialCells(xlCellTypeLastCell) says which method I want.

Lastly I add .Column to identify the property I require.

Understanding this dot notation is vital if you are to understand VBA or almost any modern programming language. In X.Y, Y can be part of X, a method that applies to X or a property of X.

Worksheets("Input").Cells(R, C) allows me to access the single cell at row R and column C. R is an integer with a minimum value of 1 and a maximum value that depends on the version of Excel being used. Rows.Count gives the maximum row number for the version you are using. C can be an integer (say 5) or a column code (say "E"). Column "A" is column 1.

Debug.Print Expression outputs Expression to the Immediate Window and follows it with a newline. Debug.Print Expression; outputs Expression to the Immediate Window but does not follow it with a newline.

ColCrnt Mod 5 returns the remainder of ColCrnt divided by 5. By testing for this remainder being 0, I can add a newline every 5 lines.

I have used the for-loop to output every value in row 1.

Although macro Test2 only contains 14 statements, it uses a lot of VBA concepts. Work through it slowly. Use F8 to step through the macro statement by statement and study what each statement does. If you can understand this macro, you know almost everything you need to know to solve your problem.

Now we need to think about matching rows. I will not use an efficient algorithm for matching rows because that would require more complicated VBA. You can enhance the code later once you have built up your knowledge. The approach I will use involves:

  • Comparing row 2 against rows 3, 4, 5, 6, ..., recording matches and recording rows that have been matched against an earlier row.
  • Comparing row 3 against row 6 but not against 4 and 5 because they have already been matched against row 2.

To record matches I need some way of recording that rows 2, 4 and 5 are the same while I go on to discover that the rows 3, 6 and 8 are the same. Having matched row 4 against row 2, I do not want to check row 4 against 5.

I will meet the second requirement with a boolean array:

Dim Matched() As Boolean

ReDim Matched(2 To RowMax)

For RowMast = 2 to RowMax
  Matched(RowMast) = False
Next

In Dim Matched() As Boolean, () says I want a dynamic array. A dynamic array is one in which I can change the upper and lower bounds at run time. VBA is one of the few languages that allow dynamic arrays and is one of even fewer languages that allow you to set the lower bound.

ReDim Matched(2 To RowMax) specifies the lower bound as 2 (= the first data row) and the upper bound as RowMax (= the last data row). You will often see statements such as ReDim Matched(N) which says I want N entries and leaves the compiler to determine the lower bound according the Option Base statement if used. I always specify the lower bound because I do not want someone interfering with my arrays by adding or changing an Option Base statement.

The following sets every element of Matched to False. This is not necessary because most modern languages initialise variables. I remember when this was not the case and prefer to be explicit.

For RowMast = 2 to RowMax
  Matched(RowMast) = False
Next

If P > N > M, when I match rows N and P against row M, I will set Matched(N) and Matched(P) to True so I do not test row N against later rows.

There are many ways of recording matches. I am going to use the crude technique of building a string.

Macro Test3 creates the output the seek. It is not an efficient piece of code but it does the job with the minimum of VBA. Add this macro to the module and run it. The output to the Immediate Window is the output you requested except for the extra rows I have added:

A, C, D
B, E, G
F, J
I, L

Good luck with VBA programming.

Sub Test3()

  Dim ColCrnt As Long
  Dim ColMax As Long
  Dim MatchCrnt As Boolean
  Dim Matched() As Boolean
  Dim MatchStgTotal As String
  Dim MatchStgCrnt As String
  Dim RowMast As Long    ' The master row; the row I am comparing
                         ' against later rows
  Dim RowMax As Long
  Dim RowSub As Long     ' The subordinate row; the row I am comparing
                         ' against an earlier row

  With Worksheets("Input")

    ColMax = .Cells.SpecialCells(xlCellTypeLastCell).Column
    RowMax = .Cells.SpecialCells(xlCellTypeLastCell).Row

    MatchStgTotal = ""      ' No matches discovered yet

    ' Initialise Matched
    ReDim Matched(2 To RowMax)
    For RowMast = 2 To RowMax
      Matched(RowMast) = False
    Next

    For RowMast = 2 To RowMax
      If Not Matched(RowMast) Then
        ' This row has not been matched against an earlier row

        MatchStgCrnt = ""   ' No matches for row RowMast discovered yet

        For RowSub = RowMast + 1 To RowMax
          ' Match row RowMast against every later row

          If Not Matched(RowSub) Then
            ' This row has not been matched against an earlier row

            MatchCrnt = True     ' Assume RowSub matches RowMast
                                 ' until find otherewise

            For ColCrnt = ColMax To 2 Step -1
              ' Compare cells from right to left so rows with different
              ' numbers of values fails to match quickly.  This is the only
              ' consession to efficiency in this loop.  There are much better
              ' ways of doing this but I think I have included enough VBA in
              ' this tutorial.

              If .Cells(RowMast, ColCrnt).Value <> _
                 .Cells(RowSub, ColCrnt).Value Then
                ' These rows do not match
                MatchCrnt = False
                Exit For   ' N point checking further cells
              End If

            Next

            If MatchCrnt Then
              ' Row RowSub matches RowMast

              ' Add this row's Id to the list of matches against RowMast
              MatchStgCrnt = MatchStgCrnt & ", " & .Cells(RowSub, 1).Value
              Matched(RowSub) = True  ' Do not check this row again

            End If
          End If

        Next RowSub

        If MatchStgCrnt <> "" Then
          ' RowMast has been matched against one or more other rows.
          ' MatchCrnt contains a list of those other rows.
          If MatchStgTotal <> "" Then
            ' A previous row have been matched.
            ' Terminate it's string with a newline
            MatchStgTotal = MatchStgTotal & vbLf
          End If
          MatchStgTotal = _
                     MatchStgTotal & .Cells(RowMast, 1).Value & MatchStgCrnt
        End If
      End If
      ' Note: Matched(RowMast) has not been set if row RowMast has been matched
      '       because I will never loook as row RowMast again.
    Next RowMast

  End With

  Debug.Print MatchStgTotal

End Sub

Upvotes: 2

Related Questions