sam
sam

Reputation: 10064

deleting duplicates with macros

ive got a list of urls that have been put together from several sources, some of the source overlap so i have duplicate listings, although they are not exact duplicates, some have http://, some www., some with a trailing slash ect.

At the moment ive got a script that can work with exact duplicates, what would i need to change init to work as above ?

this is what ive got so far

Sub Delete_duplicates()
Dim iListCount As Integer
Dim iCtr As Integer

' Turn off screen updating to speed up macro.
Application.ScreenUpdating = False

' Get count of records to search through.
iListCount = Sheets("Sheet1").Range("A1:A100").Rows.Count
Sheets("Sheet1").Range("A1").Select
' Loop until end of records.
Do Until ActiveCell = ""
   ' Loop through records.
   For iCtr = 1 To iListCount
      ' Don't compare against yourself.
      ' To specify a different column, change 1 to the column number.
      If ActiveCell.Row <> Sheets("Sheet1").Cells(iCtr, 1).Row Then
         ' Do comparison of next record.
         If ActiveCell.Value = Sheets("Sheet1").Cells(iCtr, 1).Value Then
            ' If match is true then delete row.
            Sheets("Sheet1").Cells(iCtr, 1).Delete xlShiftUp
               ' Increment counter to account for deleted row.
               iCtr = iCtr + 1
         End If
      End If
   Next iCtr
   ' Go to next record.
   ActiveCell.Offset(1, 0).Select
Loop
Application.ScreenUpdating = True
MsgBox "Done!"
End Sub

Upvotes: 0

Views: 368

Answers (1)

MikeD
MikeD

Reputation: 8941

You could use a function to "normalize" your URL's, i.e.

...
            If strapUrl(ActiveCell) = strapUrl(Sheets("Sheet1").Cells(iCtr, 1)) Then
...

Function strapURL(Arg As String) As String
Dim Tmp As String

    Tmp = Replace(Arg, "http://", "")     ' remove http://
    Tmp = Replace(Tmp, "www.", "")        ' remove www.
    If Right(Tmp, 1) = "/" Then
        Tmp = Left(Tmp, Len(Tmp) - 1)     ' remove trailing /
    End If
    strapURL = Tmp

End Function

Applying this function to some samples within the worksheet, you would yield

http://www.mydomain.com/    mydomain.com
www.mydomain.com/           mydomain.com
mydomain.com/               mydomain.com
http://www.mydomain.com     mydomain.com
www.mydomain.com            mydomain.com
mydomain.com                mydomain.com

which makes you compare URL's on an "equal basis".

Upvotes: 1

Related Questions