Reputation: 10064
ive got a list of urls that have been put together from several sources, some of the source overlap so i have duplicate listings, although they are not exact duplicates, some have http://, some www., some with a trailing slash ect.
At the moment ive got a script that can work with exact duplicates, what would i need to change init to work as above ?
this is what ive got so far
Sub Delete_duplicates()
Dim iListCount As Integer
Dim iCtr As Integer
' Turn off screen updating to speed up macro.
Application.ScreenUpdating = False
' Get count of records to search through.
iListCount = Sheets("Sheet1").Range("A1:A100").Rows.Count
Sheets("Sheet1").Range("A1").Select
' Loop until end of records.
Do Until ActiveCell = ""
' Loop through records.
For iCtr = 1 To iListCount
' Don't compare against yourself.
' To specify a different column, change 1 to the column number.
If ActiveCell.Row <> Sheets("Sheet1").Cells(iCtr, 1).Row Then
' Do comparison of next record.
If ActiveCell.Value = Sheets("Sheet1").Cells(iCtr, 1).Value Then
' If match is true then delete row.
Sheets("Sheet1").Cells(iCtr, 1).Delete xlShiftUp
' Increment counter to account for deleted row.
iCtr = iCtr + 1
End If
End If
Next iCtr
' Go to next record.
ActiveCell.Offset(1, 0).Select
Loop
Application.ScreenUpdating = True
MsgBox "Done!"
End Sub
Upvotes: 0
Views: 368
Reputation: 8941
You could use a function to "normalize" your URL's, i.e.
...
If strapUrl(ActiveCell) = strapUrl(Sheets("Sheet1").Cells(iCtr, 1)) Then
...
Function strapURL(Arg As String) As String
Dim Tmp As String
Tmp = Replace(Arg, "http://", "") ' remove http://
Tmp = Replace(Tmp, "www.", "") ' remove www.
If Right(Tmp, 1) = "/" Then
Tmp = Left(Tmp, Len(Tmp) - 1) ' remove trailing /
End If
strapURL = Tmp
End Function
Applying this function to some samples within the worksheet, you would yield
http://www.mydomain.com/ mydomain.com
www.mydomain.com/ mydomain.com
mydomain.com/ mydomain.com
http://www.mydomain.com mydomain.com
www.mydomain.com mydomain.com
mydomain.com mydomain.com
which makes you compare URL's on an "equal basis".
Upvotes: 1