Kurt
Kurt

Reputation: 3

What am I doing wrong? Removing duplicates using Excel VBA

I'm new to VBA so this is probably a very obvious mistake.

To keep it short, I am trying to delete rows based on two criteria: In Column A, if they have the same value (duplicate) and in Column B, the difference is less than 100, then one row is deleted from the bottom.

Example data:

Column A  Column B          
1         300              
1         350     SHOULD be deleted as second column diff. is <100 compared to row above
2         500              
2         700     Should NOT be deleted as second column diff. is not <100

Here is the code I have come up with:

Sub deduplication()

Dim i As Long
Dim j As Long
Dim lrow As Long

Application.ScreenUpdating = False

With Worksheets("Sheet1")

lrow = .Range("A" & .Rows.Count).End(xlUp).Row

    For i = lrow To 2 Step -1
        For j = i To 2 Step -1
            If .Cells(i, "A").Value = .Cells(j, "A").Value And .Cells(i, "B").Value - .Cells(j, "B").Value < 100 Then
               .Cells(i, "A").EntireRow.Delete
            End If
        Next j
    Next i

End With

End Sub

This largely works, but only if the second criterion is greater than (>) rather than less than (<). When it is less than, it deletes every row. What am I doing wrong? Is there an easy fix?

Thank you

Upvotes: 0

Views: 173

Answers (5)

urdearboy
urdearboy

Reputation: 14580

Sticking to the format of your code, you can do this using one For loop as well.

For i = lrow To 3 Step -1
    If .Cells(i, "A") = .Cells(i - 1, "A") And (.Cells(i, "B") - .Cells(i - 1, "B")) < 100 Then
        .Cells(i, "A").EntireRow.Delete
    End If
Next i

Upvotes: 1

Constantine
Constantine

Reputation: 1

Why not to use the built-in command:

Worksheets("Sheet1").Range("$A:$A").RemoveDuplicates Columns:=1, Header:=xlYes

Range.RemoveDuplicates Method (Excel)

Upvotes: 0

Mistella
Mistella

Reputation: 1728

Every first j-cycle starts off by comparing a row to itself since you start with j = i. The difference between a value and itself is always zero. (It also compares row 2 with itself as the very last step.)

However, if you switch:

For i = lrow To 2 Step -1
For j = i To 2 Step -1

to:

For i = lrow To 3 Step -1
For j = i - 1 To 2 Step -1`

the code will compare all the various rows without the self-compares.

Another point (which @Proger_Cbsk 's answer brought to mind), is that doing the comparison with just the subtraction .Cells(i, "B").Value - .Cells(j, "B").Value < 100 will sometimes cause unexpected results.

For example, assume .Cells(i, "B").Value = 1 and .Cells(j, "B").Value = 250. We can tell by just looking, that there is a difference of at least 100, so you would expect this part of the expression to evaluate to False. However, from straight substitution, you get the expression: 1 - 250 < 100. Since 1 - 250 = -249, and since -249 < 100, the expression would actually evaluate to True.

However, if you were to change .Cells(i, "B").Value - .Cells(j, "B").Value < 100 to Abs(.Cells(i, "B").Value - .Cells(j, "B").Value) < 100, the expression will now be looking at if the difference is greater or less than 100, instead of looking at if the subtraction result is greater or less than 100.

Upvotes: 0

tigeravatar
tigeravatar

Reputation: 26640

Something like this should work for you:

Sub tgr()

    Dim ws As Worksheet
    Dim rDel As Range
    Dim rData As Range
    Dim ACell As Range
    Dim hUnq As Object

    Set ws = ActiveWorkbook.Sheets("Sheet1")
    Set hUnq = CreateObject("Scripting.Dictionary")


    Set rData = ws.Range("A2", ws.Cells(ws.Rows.Count, "A").End(xlUp))
    If rData.Row = 1 Then Exit Sub  'No data

    For Each ACell In rData.Cells
        If Not hUnq.Exists(ACell.Value) Then
            'New Unique ACell value
            hUnq.Add ACell.Value, ACell.Value
        Else
            'Duplicate ACell value
            If Abs(ws.Cells(ACell.Row, "B").Value - ws.Cells(ACell.Row - 1, "B").Value) < 100 Then
                If rDel Is Nothing Then Set rDel = ACell Else Set rDel = Union(rDel, ACell)
            End If
        End If
    Next ACell

    If Not rDel Is Nothing Then rDel.EntireRow.Delete

End Sub

Upvotes: 1

Proger_Cbsk
Proger_Cbsk

Reputation: 462

Not

If .Cells(i, "A").Value = .Cells(j, "A").Value And .Cells(i, "B").Value - .Cells(j, "B").Value < 100 Then

Here in the second part of the statement, you're just comparing .Cells(j, "B").Value to const 100 !

But

If .Cells(i, "A").Value = .Cells(j, "A").Value And Abs(.Cells(i, "B").Value - .Cells(j, "B").Value) < 100 Then

Abs() may help, else keep just the ( )

Upvotes: 3

Related Questions