Reputation: 101
I have an excel sheet with values like this:
R1:A 1 0 1 1 0 1
R2:B 0 0 1 1 0 0
R3:C 1 0 1 1 0 1
R4:D 1 0 1 1 0 1
R5:E 0 0 1 1 0 0
R-rowOUTPUT:
A,C,D(since they have matching columns)
B,E
I need a VBA script that groups the columns based on matching values.I need to run this on a large set of data(say 417 columns) and n rows ,please be more generalised. Please help me out.Thanks in advance.
Upvotes: 0
Views: 2350
Reputation: 12413
Chris is absolutely right; questions that sound like "Please solve my entire problem" are not well received here.
I assume you know little or no VBA and do not know where to start with this problem. If you enter "Excel VBA tutorial" in your favourite search engine, you will be offered a selection of tutorials. Try a few, pick the one you like best and work through it systematically. You will be surprised how quickly you build up a good understanding.
However, to give you a start, I will build a tutorial around your problem. I will not say much about individual statements since it is easy to look them up with VB Help or a search engine. For example, the first statement is Option Explicit
. Type "excel vba option explicit" into a search engine and you will be given a selection of pages which explain what this statement does and why it is a good idea to include it.
I assume you know how to open Excel, open the VB Editor, create a module and execute macros. If not, these will be the first things any tutorial on the internet will explain.
I have created a workbook with worksheet Input. I have loaded Input with the following data:
Rows 2 to 6 match your data. I have added a header row and some more data rows of different lengths. You ask for a generalised solution but I do not know how generalised. This may be more or less than you seek. Create a similar worksheet or modify the code below to your requirements.
Copy the following macro, Test1, to a VB module and run it.
Option Explicit
Sub Test1()
Dim ColMax As Long
Dim RowMax As Long
With Worksheets("Input")
' There are many different ways of identifying the last used row and
' column. SpecialCells has a selection of parameters and is worth
' knowing so I have decided to use it to identify the last row and
' column.
ColMax = .Cells.SpecialCells(xlCellTypeLastCell).Column
RowMax = .Cells.SpecialCells(xlCellTypeLastCell).Row
' Debug.Print outputs values to the Immediate Window which will be at the
' bottom of the VB Editor window. If the Immediate Window is is missing,
' click Ctrl+G.
Debug.Print "Last used column " & ColMax
Debug.Print "Last used row " & RowMax
End With
End Sub
With my data, the macro output the following to the Immediate Window:
Last used column 10
Last used row 13
Column "J" is the 10th column. This code identifies the last row and column used which I must know if my macro is to examine the correct number of rows and columns. Outputting values to the Immediate Window is an easy way of checking your code.
Now add macro Test2 to the module and run it:
Sub Test2()
Dim ColCrnt As Long
Dim ColMax As Long
With Worksheets("Input")
ColMax = .Cells.SpecialCells(xlCellTypeLastCell).Column
Debug.Print "Row 1:"
For ColCrnt = 1 To ColMax
Debug.Print "Col " & ColCrnt & "=" & .Cells(1, ColCrnt).Value & " ";
If ColCrnt Mod 5 = 0 Then
Debug.Print
End If
Next
End With
End Sub
With my data, the macro output the following to the Immediate Window:
Row 1:
Col 1=Id Col 2=Value 1 Col 3=Value 2 Col 4=Value 3 Col 5=Value 4
Col 6=Value 5 Col 7=Value 6 Col 8=Value 7 Col 9=Value 8 Col 10=Value 9
I will still expect you to use VB Help or the internet to get a description of the statements I have used but some explanation of what I am doing is necessary.
Consider:
ColMax = Worksheets("Input").Cells.SpecialCells(xlCellTypeLastCell).Column
In macro Test2 I had separated Worksheets("Input")
from the rest of the string into a With Statement
. This makes the code faster, clearer and smaller but I could have written a single string as here.
Worksheets("Input")
references the entire worksheet.
Worksheets("Input").X
references part X of the worksheet. I could have referenced the charts or the default values but I wanted to reference the cells. Worksheets("Input").Cells
references all cells within the worksheet.
Worksheets("Input").Cells.X
references part of the cells or a method that operates on cells. Worksheets("Input").Cells.Sort
, for example, would allow me to sort the worksheets.
Worksheets("Input").Cells.SpecialCells
gives me access to one of a set of methods which return information about Worksheets("Input").Cells
. Adding a parameter to give: Worksheets("Input").Cells.SpecialCells(xlCellTypeLastCell)
says which method I want.
Lastly I add .Column
to identify the property I require.
Understanding this dot notation is vital if you are to understand VBA or almost any modern programming language. In X.Y
, Y can be part of X, a method that applies to X or a property of X.
Worksheets("Input").Cells(R, C)
allows me to access the single cell at row R and column C. R is an integer with a minimum value of 1 and a maximum value that depends on the version of Excel being used. Rows.Count
gives the maximum row number for the version you are using. C can be an integer (say 5) or a column code (say "E"). Column "A" is column 1.
Debug.Print Expression
outputs Expression to the Immediate Window and follows it with a newline. Debug.Print Expression;
outputs Expression to the Immediate Window but does not follow it with a newline.
ColCrnt Mod 5
returns the remainder of ColCrnt divided by 5. By testing for this remainder being 0, I can add a newline every 5 lines.
I have used the for-loop to output every value in row 1.
Although macro Test2 only contains 14 statements, it uses a lot of VBA concepts. Work through it slowly. Use F8 to step through the macro statement by statement and study what each statement does. If you can understand this macro, you know almost everything you need to know to solve your problem.
Now we need to think about matching rows. I will not use an efficient algorithm for matching rows because that would require more complicated VBA. You can enhance the code later once you have built up your knowledge. The approach I will use involves:
To record matches I need some way of recording that rows 2, 4 and 5 are the same while I go on to discover that the rows 3, 6 and 8 are the same. Having matched row 4 against row 2, I do not want to check row 4 against 5.
I will meet the second requirement with a boolean array:
Dim Matched() As Boolean
ReDim Matched(2 To RowMax)
For RowMast = 2 to RowMax
Matched(RowMast) = False
Next
In Dim Matched() As Boolean
, ()
says I want a dynamic array. A dynamic array is one in which I can change the upper and lower bounds at run time. VBA is one of the few languages that allow dynamic arrays and is one of even fewer languages that allow you to set the lower bound.
ReDim Matched(2 To RowMax)
specifies the lower bound as 2 (= the first data row) and the upper bound as RowMax (= the last data row). You will often see statements such as ReDim Matched(N)
which says I want N entries and leaves the compiler to determine the lower bound according the Option Base statement if used. I always specify the lower bound because I do not want someone interfering with my arrays by adding or changing an Option Base statement.
The following sets every element of Matched to False. This is not necessary because most modern languages initialise variables. I remember when this was not the case and prefer to be explicit.
For RowMast = 2 to RowMax
Matched(RowMast) = False
Next
If P > N > M, when I match rows N and P against row M, I will set Matched(N) and Matched(P) to True so I do not test row N against later rows.
There are many ways of recording matches. I am going to use the crude technique of building a string.
Macro Test3 creates the output the seek. It is not an efficient piece of code but it does the job with the minimum of VBA. Add this macro to the module and run it. The output to the Immediate Window is the output you requested except for the extra rows I have added:
A, C, D
B, E, G
F, J
I, L
Good luck with VBA programming.
Sub Test3()
Dim ColCrnt As Long
Dim ColMax As Long
Dim MatchCrnt As Boolean
Dim Matched() As Boolean
Dim MatchStgTotal As String
Dim MatchStgCrnt As String
Dim RowMast As Long ' The master row; the row I am comparing
' against later rows
Dim RowMax As Long
Dim RowSub As Long ' The subordinate row; the row I am comparing
' against an earlier row
With Worksheets("Input")
ColMax = .Cells.SpecialCells(xlCellTypeLastCell).Column
RowMax = .Cells.SpecialCells(xlCellTypeLastCell).Row
MatchStgTotal = "" ' No matches discovered yet
' Initialise Matched
ReDim Matched(2 To RowMax)
For RowMast = 2 To RowMax
Matched(RowMast) = False
Next
For RowMast = 2 To RowMax
If Not Matched(RowMast) Then
' This row has not been matched against an earlier row
MatchStgCrnt = "" ' No matches for row RowMast discovered yet
For RowSub = RowMast + 1 To RowMax
' Match row RowMast against every later row
If Not Matched(RowSub) Then
' This row has not been matched against an earlier row
MatchCrnt = True ' Assume RowSub matches RowMast
' until find otherewise
For ColCrnt = ColMax To 2 Step -1
' Compare cells from right to left so rows with different
' numbers of values fails to match quickly. This is the only
' consession to efficiency in this loop. There are much better
' ways of doing this but I think I have included enough VBA in
' this tutorial.
If .Cells(RowMast, ColCrnt).Value <> _
.Cells(RowSub, ColCrnt).Value Then
' These rows do not match
MatchCrnt = False
Exit For ' N point checking further cells
End If
Next
If MatchCrnt Then
' Row RowSub matches RowMast
' Add this row's Id to the list of matches against RowMast
MatchStgCrnt = MatchStgCrnt & ", " & .Cells(RowSub, 1).Value
Matched(RowSub) = True ' Do not check this row again
End If
End If
Next RowSub
If MatchStgCrnt <> "" Then
' RowMast has been matched against one or more other rows.
' MatchCrnt contains a list of those other rows.
If MatchStgTotal <> "" Then
' A previous row have been matched.
' Terminate it's string with a newline
MatchStgTotal = MatchStgTotal & vbLf
End If
MatchStgTotal = _
MatchStgTotal & .Cells(RowMast, 1).Value & MatchStgCrnt
End If
End If
' Note: Matched(RowMast) has not been set if row RowMast has been matched
' because I will never loook as row RowMast again.
Next RowMast
End With
Debug.Print MatchStgTotal
End Sub
Upvotes: 2