Reputation: 171
Working with Excel files that contain over 3000 columns, and having an issue that there are several column headers that are duplicated. When this data is fed into another system it errors out. Looking for a way to merge the columns in the spreadsheet. Same header can appear in two to six columns, but each row of data only populates a single column.
I have seen one post that merges the column headers of the duplicates are next to each other, and I can do this (as I have in my sample data), but that post only merges the header data.
Not sure how to attach sample data, but hope people can see this:
1350725 1350725 1350740 1350813 1351468 1351468
B A C
A C B E
C D C E
A C C D
B E B
Upvotes: 0
Views: 227
Reputation: 2411
You can do this easily with Power query. It is an add-in for Excel 2010+ (by default in Excel 2016 known as Get & Transform). There you can connect directly Excel with any source of data and then transform the data in the Query Editor. For your case follow this steps:
Upvotes: 2
Reputation: 4824
PowerQuery is by far the best tool for this, because you can whip together a solution in mere minutes without expending too much brain power.
But for completeness, here's a VBA solution that will do what you want, and that also handles more than two duplicate columns. It assumes these columns will always be located alongside, as with your sample data.
This took me between 30mins and 60mins to put together and troubleshoot, because I was trying to optimize and dealing with columns you delete takes a bit of thought. By contrast, putting together a solution in PQ probably would have taken me mere minutes. That's why I voted for the approach that @virtualdvid took. And in terms of efficiency and robustness, my approach isn't as fast or robust as Rick's Dictionary approach below. This horse would come in a distant third compared to PQ or Dictionary.
Sub Test()
Dim lLastRow As Long
Dim lLastCol As Long
Dim i As Long
Dim j As Long
Dim k As Long
Dim sHeader1 As String
Dim sHeader2 As String
lLastCol = ActiveSheet.Cells(1, ActiveSheet.Columns.Count).End(xlToLeft).Column
For i = lLastCol To 1 Step -1
sHeader1 = Cells(1, i)
For j = i - 1 To 1 Step -1
sHeader2 = Cells(1, j)
If sHeader2 <> sHeader1 Then Exit For
If sHeader1 = sHeader2 Then
lLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, i).End(xlUp).Row
For k = 2 To lLastRow
If Cells(k, i).Value <> "" Then
Cells(k, j).Value = Cells(k, i).Value
End If
Next k
Columns(i).Delete Shift:=xlToLeft
End If
Next j
Next i
End Sub
Note that this is not optimal VBA. You can optimize this further by turning off screenupdating. Even better, don't bother doing that but instead pull ALL the data into VBA in one go in a Variant Array, do the consolidation there using similar code, then dump it back into the worksheet in one go. And even better than that would be a Dictionary approach similar to Ricks.
Upvotes: 0
Reputation: 60174
Hopefully this should work. I use a dictionary storing arrays to dedupe the columns. Note that you MUST set a reference as in the comments (or make some small changes to use late-binding). And also you will need to change the source and results worksheet names to align with your data.
Furthermore, an assumption is that the source data table is the only thing on this worksheet, and it starts in A1
. The LastRowCol
function detects the end point of the data.
If your source data table does not meet these requirements, changes will need to be made to detect the correct data area.
'Set reference to Microsoft Scripting Runtime
Option Explicit
Sub CombineColumns()
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim dD As Dictionary
Dim I As Long, J As Long
Dim lLastRowCol() As Long
Dim V() As Variant
'set Source and REsults worksheets, ranges
Set wsSrc = Worksheets("sheet5")
Set wsRes = Worksheets("sheet6")
Set rRes = wsRes.Cells(1, 1)
'Get source data into vba array
With wsSrc
lLastRowCol = LastRowCol(wsSrc.Name)
vSrc = .Range(.Cells(1, 1), .Cells(lLastRowCol(0), lLastRowCol(1)))
End With
'Collect and merge the data
Set dD = New Dictionary
ReDim V(2 To UBound(vSrc, 1))
For J = 1 To UBound(vSrc, 2)
If Not dD.Exists(vSrc(1, J)) Then 'set new dictionary item
For I = 2 To UBound(vSrc, 1)
V(I) = vSrc(I, J)
Next I
dD.Add Key:=vSrc(1, J), Item:=V
Else 'combine the columns
For I = 2 To UBound(vSrc, 1)
If vSrc(I, J) <> "" Then
V = dD(vSrc(1, J))
V(I) = vSrc(I, J)
dD(vSrc(1, J)) = V
End If
Next I
End If
Next J
'Write results to output array
ReDim vRes(0 To UBound(vSrc, 1) - 1, 1 To dD.Count)
'Headers
J = 0
Dim V1 As Variant
For Each V1 In dD.Keys
J = J + 1
vRes(0, J) = V1
Next V1
'Data
For J = 1 To UBound(vRes, 2)
I = 0
For Each V1 In dD(vRes(0, J))
I = I + 1
vRes(I, J) = V1
Next V1
Next J
'write the results
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
.EntireColumn.AutoFit
End With
End Sub
Private Function LastRowCol(Worksht As String) As Long()
Dim WS As Worksheet, R As Range
Dim LastRow As Long, LastCol As Long
Dim L(1) As Long
Set WS = Worksheets(Worksht)
With WS
Set R = .Cells.Find(what:="*", after:=.Cells(1, 1), _
LookIn:=xlValues, searchorder:=xlByRows, _
searchdirection:=xlPrevious)
If Not R Is Nothing Then
LastRow = R.Row
LastCol = .Cells.Find(what:="*", after:=.Cells(1, 1), _
LookIn:=xlValues, searchorder:=xlByColumns, _
searchdirection:=xlPrevious).Column
Else
LastRow = 1
LastCol = 1
End If
End With
L(0) = LastRow
L(1) = LastCol
LastRowCol = L
End Function
Original Data
Combined
Upvotes: 2
Reputation: 5450
This might work for you:
Sub Test()
Dim lastcol As Long, lastrow As Long, lastrow2 As Long, i As Long, j As Long, k As Long
lastcol = ActiveSheet.Cells(1, ActiveSheet.Columns.Count).End(xlToLeft).Column
For i = 1 To lastcol
For j = i To lastcol
If Cells(1, i).Value = Cells(1, j).Value And i <> j Then 'Merge em
lastrow = ActiveSheet.Cells(ActiveSheet.Rows.Count, i).End(xlUp).Row
lastrow2 = ActiveSheet.Cells(ActiveSheet.Rows.Count, j).End(xlUp).Row
If lastrow < lastrow2 Then
lastrow = lastrow2
End If
For k = 2 To lastrow
If Cells(k, j).Value <> "" Then
Cells(k, i).Value = Cells(k, j).Value
End If
Next k
Columns(j).Delete Shift:=xlToLeft
Exit For
End If
Next j
Next i
End Sub
Test data:
On your test data:
Not sure why my picture isn't coming through... Sorry about that. EDIT: Looks like it's working now.
Upvotes: -1