Brent Oliver
Brent Oliver

Reputation: 171

Merging data in Excel

Working with Excel files that contain over 3000 columns, and having an issue that there are several column headers that are duplicated. When this data is fed into another system it errors out. Looking for a way to merge the columns in the spreadsheet. Same header can appear in two to six columns, but each row of data only populates a single column.

I have seen one post that merges the column headers of the duplicates are next to each other, and I can do this (as I have in my sample data), but that post only merges the header data.

Not sure how to attach sample data, but hope people can see this:

1350725  1350725  1350740  1350813  1351468 1351468
B                          A        C
         A        C        B                E
C                 D        C        E  
A                 C        C        D
B                          E                B

Upvotes: 0

Views: 227

Answers (4)

virtualdvid
virtualdvid

Reputation: 2411

You can do this easily with Power query. It is an add-in for Excel 2010+ (by default in Excel 2016 known as Get & Transform). There you can connect directly Excel with any source of data and then transform the data in the Query Editor. For your case follow this steps:

Upvotes: 2

jeffreyweir
jeffreyweir

Reputation: 4824

PowerQuery is by far the best tool for this, because you can whip together a solution in mere minutes without expending too much brain power.

But for completeness, here's a VBA solution that will do what you want, and that also handles more than two duplicate columns. It assumes these columns will always be located alongside, as with your sample data.

This took me between 30mins and 60mins to put together and troubleshoot, because I was trying to optimize and dealing with columns you delete takes a bit of thought. By contrast, putting together a solution in PQ probably would have taken me mere minutes. That's why I voted for the approach that @virtualdvid took. And in terms of efficiency and robustness, my approach isn't as fast or robust as Rick's Dictionary approach below. This horse would come in a distant third compared to PQ or Dictionary.

Sub Test()

Dim lLastRow As Long
Dim lLastCol As Long
Dim i As Long
Dim j As Long
Dim k As Long
Dim sHeader1 As String
Dim sHeader2 As String

lLastCol = ActiveSheet.Cells(1, ActiveSheet.Columns.Count).End(xlToLeft).Column

For i = lLastCol To 1 Step -1
    sHeader1 = Cells(1, i)
    For j = i - 1 To 1 Step -1
        sHeader2 = Cells(1, j)
        If sHeader2 <> sHeader1 Then Exit For
        If sHeader1 = sHeader2 Then
            lLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, i).End(xlUp).Row
            For k = 2 To lLastRow
                If Cells(k, i).Value <> "" Then
                    Cells(k, j).Value = Cells(k, i).Value
                End If
            Next k
            Columns(i).Delete Shift:=xlToLeft
        End If
    Next j
Next i

End Sub

Note that this is not optimal VBA. You can optimize this further by turning off screenupdating. Even better, don't bother doing that but instead pull ALL the data into VBA in one go in a Variant Array, do the consolidation there using similar code, then dump it back into the worksheet in one go. And even better than that would be a Dictionary approach similar to Ricks.

Upvotes: 0

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60174

Hopefully this should work. I use a dictionary storing arrays to dedupe the columns. Note that you MUST set a reference as in the comments (or make some small changes to use late-binding). And also you will need to change the source and results worksheet names to align with your data.

Furthermore, an assumption is that the source data table is the only thing on this worksheet, and it starts in A1. The LastRowCol function detects the end point of the data.

If your source data table does not meet these requirements, changes will need to be made to detect the correct data area.

'Set reference to Microsoft Scripting Runtime
Option Explicit
Sub CombineColumns()
    Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
    Dim vSrc As Variant, vRes As Variant
    Dim dD As Dictionary
    Dim I As Long, J As Long
    Dim lLastRowCol() As Long
    Dim V() As Variant

'set Source and REsults worksheets, ranges
 Set wsSrc = Worksheets("sheet5")
 Set wsRes = Worksheets("sheet6")
    Set rRes = wsRes.Cells(1, 1)

'Get source data into vba array
With wsSrc
    lLastRowCol = LastRowCol(wsSrc.Name)
    vSrc = .Range(.Cells(1, 1), .Cells(lLastRowCol(0), lLastRowCol(1)))
End With

'Collect and merge the data
Set dD = New Dictionary
ReDim V(2 To UBound(vSrc, 1))
For J = 1 To UBound(vSrc, 2)
    If Not dD.Exists(vSrc(1, J)) Then 'set new dictionary item
        For I = 2 To UBound(vSrc, 1)
            V(I) = vSrc(I, J)
        Next I
        dD.Add Key:=vSrc(1, J), Item:=V
    Else 'combine the columns
        For I = 2 To UBound(vSrc, 1)
            If vSrc(I, J) <> "" Then
                V = dD(vSrc(1, J))
                V(I) = vSrc(I, J)
                dD(vSrc(1, J)) = V
            End If
        Next I
    End If
Next J

'Write results to output array
ReDim vRes(0 To UBound(vSrc, 1) - 1, 1 To dD.Count)

'Headers
J = 0
Dim V1 As Variant
For Each V1 In dD.Keys
    J = J + 1
    vRes(0, J) = V1
Next V1

'Data
For J = 1 To UBound(vRes, 2)
    I = 0
    For Each V1 In dD(vRes(0, J))
        I = I + 1
        vRes(I, J) = V1
    Next V1
Next J

'write the results
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
    .EntireColumn.Clear
    .Value = vRes
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
    End With
    .EntireColumn.AutoFit
End With

End Sub

Private Function LastRowCol(Worksht As String) As Long()
    Dim WS As Worksheet, R As Range
    Dim LastRow As Long, LastCol As Long
    Dim L(1) As Long
Set WS = Worksheets(Worksht)
With WS
    Set R = .Cells.Find(what:="*", after:=.Cells(1, 1), _
                    LookIn:=xlValues, searchorder:=xlByRows, _
                    searchdirection:=xlPrevious)

    If Not R Is Nothing Then
        LastRow = R.Row
        LastCol = .Cells.Find(what:="*", after:=.Cells(1, 1), _
                    LookIn:=xlValues, searchorder:=xlByColumns, _
                    searchdirection:=xlPrevious).Column
    Else
        LastRow = 1
        LastCol = 1
    End If
End With

L(0) = LastRow
L(1) = LastCol
LastRowCol = L
End Function

Original Data

enter image description here

Combined

enter image description here

Upvotes: 2

dwirony
dwirony

Reputation: 5450

This might work for you:

Sub Test()

Dim lastcol As Long, lastrow As Long, lastrow2 As Long, i As Long, j As Long, k As Long

lastcol = ActiveSheet.Cells(1, ActiveSheet.Columns.Count).End(xlToLeft).Column

For i = 1 To lastcol
    For j = i To lastcol
        If Cells(1, i).Value = Cells(1, j).Value And i <> j Then 'Merge em
            lastrow = ActiveSheet.Cells(ActiveSheet.Rows.Count, i).End(xlUp).Row
            lastrow2 = ActiveSheet.Cells(ActiveSheet.Rows.Count, j).End(xlUp).Row

            If lastrow < lastrow2 Then
                lastrow = lastrow2
            End If

                For k = 2 To lastrow
                    If Cells(k, j).Value <> "" Then
                        Cells(k, i).Value = Cells(k, j).Value
                    End If
                Next k

                Columns(j).Delete Shift:=xlToLeft
                Exit For
        End If
    Next j
Next i

End Sub

Test data:

TestData

On your test data:

TestData2

Not sure why my picture isn't coming through... Sorry about that. EDIT: Looks like it's working now.

Upvotes: -1

Related Questions