Anuj Thite
Anuj Thite

Reputation: 75

Remove redundant data from cell in excel worksheet

I have data present in two cells in 2 different columns.

Ex.:
ColA: A1 Cell has comma separated values 1,2,3
ColB: B1 Cell has comma separated values ABC,DEF,ABC

Want to implement logic so that it that it should get displayed as,

ColA    ColB
1,3     ABC
2       DEF

Ex2.:
ColA: A1 Cell has comma separated values 1,2,3
ColB: B1 Cell has comma separated values ABC,ABC,ABC

ColA      ColB
1,2,3     ABC

Till Now, I have implemented logic for Column B But, Not able to update col A data while doing this.

Sub RemoveDupData()
    Dim sString As String
    Dim MyAr As Variant
    Dim Col As New Collection
    Dim itm

    sString = "ABC,DEF,ABC,CDR"

    MyAr = Split(sString, ",")

    For i = LBound(MyAr) To UBound(MyAr)
        On Error Resume Next
        '-- A collection cannot have the same key twice so here, we are creating a key using the item that we are adding.
        '-- This will ensure that we will not get duplicates.       
        Col.Add Trim(MyAr(i)), CStr(Trim(MyAr(i)))
        On Error GoTo 0
    Next i

    sString = ""

    For Each itm In Col
        sString = sString & "," & itm
    Next

    sString = Mid(sString, 2)

End Sub

Upvotes: 2

Views: 196

Answers (3)

DisplayName
DisplayName

Reputation: 13386

you could use Dictionary object

Option Explicit

Sub RemoveDupData()
    Dim AData As Variant, BData As Variant

    With Range("A1", cells(Rows.Count, 1).End(xlUp))
        AData = Application.Transpose(.Value)
        BData = Application.Transpose(.Offset(, 1).Value)
        .Resize(, 2).ClearContents
    End With

    Dim irow As Long
    For irow = 1 To UBound(AData)
        WriteNoDupes Split(AData(irow), ","), Split(BData(irow), ",")
    Next
    Range("A1:B1").Delete Shift:=xlUp
End Sub

Sub WriteNoDupes(ADatum As Variant, BDatum As Variant)
    Dim iItem As Long, key As Variant
    With CreateObject("scripting.dictionary")
        For iItem = 0 To UBound(ADatum)
            .Item(BDatum(iItem)) = .Item(BDatum(iItem)) & " " & ADatum(iItem)
        Next
        For Each key In .Keys
            cells(Rows.Count, 1).End(xlUp).Offset(1).Value = Replace(Trim(.Item(key)), " ", ",")
            cells(Rows.Count, 2).End(xlUp).Offset(1).Value = key
        Next
    End With
End Sub

Upvotes: 0

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

This method is more complex than Jeeped's, but may be more easily adaptable to variations.

I did a row by row type of processing, but, by simply changing how the key is generated, one could de-duplicate the entire data set colB (see comment in the code)

I used a dictionary to ensure non-duplicate keys, and the dictionary item would be a collection of the related colA values.

Sub FixData()
    Dim wsSrc As Worksheet, wsRes As Worksheet, rRes  As Range
    Dim vSrc As Variant, vRes As Variant
    Dim vA As Variant, vB As Variant
    Dim I As Long, J As Long
    Dim dD As Object, Col As Collection
    Dim sKey As String

Set wsSrc = Worksheets("sheet1")

'Note that depending on how you set these parameters, you will be
'able to write the Results anyplace in the workbook,
'even overlying the original data
Set wsRes = Worksheets("sheet1")
    Set rRes = wsRes.Cells(1, 5)

With wsSrc
    vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 2).End(xlUp))
End With

'Use a dictionary to collect both the unique items in ColB (which will be the key)
'and a collection of the relevant objects in ColA
Set dD = CreateObject("scripting.dictionary")
For I = 1 To UBound(vSrc, 1)
    vA = Split(vSrc(I, 1), ",")
    vB = Split(vSrc(I, 2), ",")
        If UBound(vA) <> UBound(vB) Then
            MsgBox "different number of elements in each column"
        End If

        For J = 0 To UBound(vA)
            sKey = vB(J) & "|" & I

            'To remove dups from the entire data set
            ' change above line to:
            'sKey = vB(J)

            If Not dD.Exists(sKey) Then
                Set Col = New Collection
                Col.Add vA(J)
                dD.Add Key:=sKey, Item:=Col
            Else
                dD(sKey).Add vA(J)
            End If
        Next J
Next I

'Create Results array
ReDim vRes(1 To dD.Count, 1 To 2)
I = 0
For Each vB In dD.Keys
    I = I + 1
    vRes(I, 2) = Split(vB, "|")(0)

    For J = 1 To dD(vB).Count
        vRes(I, 1) = vRes(I, 1) & "," & dD(vB)(J)
    Next J
        vRes(I, 1) = Mid(vRes(I, 1), 2) 'remove leading comma
Next vB

'write results to worksheet
Set rRes = rRes.Resize(UBound(vRes, 1), 2)
With rRes
    .EntireColumn.Clear
    .Value = vRes
    .HorizontalAlignment = xlLeft
End With
End Sub

Source Data

enter image description here

Row by Row processing

enter image description here

Entire Data Set processing

enter image description here

Upvotes: 2

user4039065
user4039065

Reputation:

This seems to satisfy both of the examples you posted.

Option Explicit

Sub RemoveDupData()
    Dim i As Long, valA As Variant, valB As Variant, r As Variant
    With Worksheets("sheet7")
        valA = Split(.Cells(1, "A"), Chr(44))
        valB = Split(.Cells(1, "B"), Chr(44))

        For i = LBound(valB) To UBound(valB)
            r = Application.Match(valB(i), valB, 0)
            Select Case True
                Case r < i + 1
                    valB(i) = vbNullString
                Case r > 1
                    .Cells(.Rows.Count, "A").End(xlUp).Offset(1, 0).Resize(1, 2) = _
                        Array(valA(i), valB(i))
                    valA(i) = vbNullString
                    valB(i) = vbNullString
            End Select
        Next i

        valA = Replace(Application.Trim(Join(valA, Chr(32))), Chr(32), Chr(44))
        valB = Replace(Application.Trim(Join(valB, Chr(32))), Chr(32), Chr(44))

        .Cells(1, "A").Resize(1, 2) = Array(valA, valB)
    End With
End Sub

Upvotes: 1

Related Questions