Reputation: 10224
I have a Range
object referencing a column on a worksheet. This column contains mixed datatypes (numbers, text, and a few other things).
I want to remove duplicates from the column:
rge.RemoveDuplicates Columns:=1, Header:xlNo
but this does not correctly remove all duplicates due to the mixed datatypes.
I know that mixed datatypes are the problem because manually converting the cells using TEXT($REF, "0")
before attempting to remove duplicates from a copy of those values is successful.
How can I replace all values in the range with their text equivalents?
I've tried the obvious:
rge = rge.Text
rge.Value = rge.Text
but without success.
Please note that iteration is not an option; I'm dealing with tens of thousands of rows of data, and the performance penalty for writing cells individually is far too high. I need something which can operate on the entire range at once.
(If it turns out that iteration is the only solution, it will actually be faster to make a first pass using .RemoveDuplicates
, sort the data, then manually take out the remaining ones in n
time.)
EDIT: Additional information
If I copy and paste a subset of the range which does not contain duplicates, and then manually run Remove Duplicates on it, the duplicate values are removed.
However, if I copy a subset of the range which also contains numbers, then duplicates are not removed, even though the duplicates are not themselves numbers.
My guess (and this is only a guess) is that internally excel uses a different comparison algorithm for mixed-datatype values than it does for purely text values.
Minimum working example: https://dl.dropboxusercontent.com/u/1402749/dups.xlsx
Upvotes: 0
Views: 2412
Reputation: 3279
Your sample dataset was already formatted as text... I changed several rows to a number format and was able to remove the duplicates (without formatting everything as text) using the following code:
Sub RemoveDuplicates()
Dim r As Range
Dim w As Worksheet
Set w = ActiveSheet
Set r = w.Range("A1:A100000")
r.AdvancedFilter Action:=xlFilterCopy, CopyToRange:=r.Offset(0, 1), Unique:=True
End Sub
The above code puts the unique values in column B, so you'd have to modify this to suit your needs. If you'd prefer the data remain in column A, you can create a temp sheet to put the unique values, delete your original data set, and then move the unique values back into the original sheet.
The above code assumes that you have a header for your dataset. I also don't know how well this will perform on large datasets... so you may need to do some testing to see if it will work for you.
Edit
I just tested this on 100K rows and it took ~50 seconds to finish... so I'm guessing this solution isn't viable. And I just saw you chose David's answer. :) I'll leave this up in case it helps someone else in the future.
Edit 2
I missed Ron's answer before I posted mine. We use the same function, but his answer has more functionality than mine.
Upvotes: 0
Reputation: 19727
You can have a go at this:
If your data size is <= 30k rows: Elapse time around 0.2 sec comparable to Excel's RemoveDuplicates
Dim arr As Variant, i As Long
'~~> pass range values to array
With SheetCodename '~~> Change to suit
arr = Application.Transpose(.Range("A1", .Range("A" & .Rows.Count).End(xlUp)))
End With
'~~> use Dictionary to remove dupes
With CreateObject("Scripting.Dictionary")
For i = LBound(arr) To UBound(arr)
.Item(CStr(arr(i))) = CStr(arr(i))
Next
SheetCodename.Range("A:A").ClearContents '~~> Clear source range
'~~> Return unique items to range
SheetCodename.Range("A1", "A" & .Count) = Application.Transpose(.Items)
End With
I tested this on your sample data and 17 unique values were returned.
For larger data sets though, this might fail due to Excel Memory shortcomings.
Edit1:
I got really interested to make this work in 100k rows and more so.
I then stumbled on THIS and below is what I've come up with.
Actual number of data tested: 168091
Dim rng As Range, cel As Range
Dim arr() As Variant, i As Long, key, start
start = Timer
With Sheet4
Set rng = .Range("A1", .Range("A" & .Rows.Count).End(xlUp))
rng.RemoveDuplicates 1, xlNo
End With
Debug.Print Timer - start '3.585938 sec
start = Timer
With Sheet2
Set rng = .Range("A1", .Range("A" & .Rows.Count).End(xlUp))
End With
'~~> Use Dictionary to remove duplicates
With CreateObject("Scripting.Dictionary")
'~~> need to loop through range since Array have limitations as well
For Each cel In rng
.Item(CStr(cel.Value2)) = CStr(cel.Value2)
Next
'~~> array limit workaround
ReDim arr(.Count, 2): i = 0
For Each key In .Keys
arr(i, 0) = .Item(key)
i = i + 1
Next
'~~> Return unique items to range
Sheet2.Range("A:A").ClearContents
Sheet2.Range("A1", "A" & .Count) = arr
End With
Debug.Print Timer - start '5.257813 sec
The result is the same with using Excels RemoveDuplicates (I mean the unique output).
There is a 1.671875 sec difference in performance but for me it is still manageable.
Upvotes: 0
Reputation: 53623
I didn't try to use the RemoveDuplicates
method since it seems to not work for you anyways.
I use a dictionary object to do the dirty work and help ensure uniqueness. Based on the (apparent) success of this example, I'm not sure you need to worry about converting values to text. This iteration uses the values only, and then re-writes to the range. If you require additional formatting, please clarify :)
Sub Test()
Dim d As Object 'Scripting.Dictionary
' requires reference to Microsoft Scripting Runtime if you
' want to use early-binding
Dim rng As Range
Dim cl As Variant
Dim var As Variant
'#Define our range
Set rng = Range("A1:A22")
'#Store values in an array
var = rng.Value
'#Instantiate our dictioanry object
Set d = CreateObject("Scripting.Dictionary")
'#store unique vals in the dictionary
For Each cl In var
d(cl) = cl
Next
'#Clear the original range
rng.Clear
'#Put the unique vals in to the range
rng.Resize(UBound(d.Keys) + 1).Value = Application.Transpose(d.Keys())
Set d = Nothing
End Sub
On the sample data, I end up with 17 unique values:
Upvotes: 3
Reputation: 60199
I don't know why removeduplicates doesn't work. But I it doesn't work on your sample data. As a "workaround" I would suggest trying the Advanced Filter. The only drawback will be that it will always consider the first row as a header, so you may need to compensate for that. Here is a routine that works on your sample data. I chose to copy to a new destination, then overwrite the original, but you may wish to use a different scheme.
Also, if it works for you, you may want to disable screen updating while the macro is running.
By the way, the routine works with General formatting and mixed numeric and text data also. Probably no need to convert everything to text.
Sub RemDups()
Dim R As Range
Dim rDest As Range
Set R = Range("a1", Cells(Rows.Count, "A").End(xlUp))
Set rDest = Range("D1")
rDest.EntireColumn.Clear
R.AdvancedFilter xlFilterCopy, , rDest, True
R.EntireColumn.Clear
Set rDest = Range(rDest, Cells(Rows.Count, rDest.Column).End(xlUp))
rDest.Copy R(1)
rDest.Clear
End Sub
Upvotes: 0
Reputation: 414
I think there's an issue with your use of the RemoveDuplicates syntax.
Try:
rge.RemoveDuplicates Columns:=Array(1), Header:=xlNo
I would recommend running the other code to first standardize the formats as text though. This syntax worked fine for me.
Upvotes: 0
Reputation: 414
I believe this will provide the result you are looking for. Insert this function into VBA editor.
Public Sub ConvertToText()
Dim c As Range
Dim a As Areas
Dim v As Variant
Set c = Selection
Set a = c.Areas
If a.Count > 1 Then
' IF DESIRED YOU CAN EXTEND THE LOGIC FOR MULTIPLE AREAS | CURRENT FUNCTION DOES NOT SUPPORT
MsgBox "Select one continuous range.", vbCritical, "Error"
Exit Sub
End If
v = WorksheetFunction.Transpose(WorksheetFunction.Transpose(c.Value))
c.Clear
c.NumberFormat = "@"
c = v
End Sub
Upvotes: 0
Reputation: 1195
Selection.NumberFormat = "@"
I would think rge.NumberFormat = "@" would work
Upvotes: 0