sapi
sapi

Reputation: 10224

How do you convert all values in a Range to text?

I have a Range object referencing a column on a worksheet. This column contains mixed datatypes (numbers, text, and a few other things).

I want to remove duplicates from the column:

rge.RemoveDuplicates Columns:=1, Header:xlNo

but this does not correctly remove all duplicates due to the mixed datatypes.

I know that mixed datatypes are the problem because manually converting the cells using TEXT($REF, "0") before attempting to remove duplicates from a copy of those values is successful.


How can I replace all values in the range with their text equivalents?

I've tried the obvious:

rge = rge.Text
rge.Value = rge.Text

but without success.


Please note that iteration is not an option; I'm dealing with tens of thousands of rows of data, and the performance penalty for writing cells individually is far too high. I need something which can operate on the entire range at once.

(If it turns out that iteration is the only solution, it will actually be faster to make a first pass using .RemoveDuplicates, sort the data, then manually take out the remaining ones in n time.)


EDIT: Additional information

If I copy and paste a subset of the range which does not contain duplicates, and then manually run Remove Duplicates on it, the duplicate values are removed.

However, if I copy a subset of the range which also contains numbers, then duplicates are not removed, even though the duplicates are not themselves numbers.

My guess (and this is only a guess) is that internally excel uses a different comparison algorithm for mixed-datatype values than it does for purely text values.


Minimum working example: https://dl.dropboxusercontent.com/u/1402749/dups.xlsx

Upvotes: 0

Views: 2412

Answers (7)

ARich
ARich

Reputation: 3279

Your sample dataset was already formatted as text... I changed several rows to a number format and was able to remove the duplicates (without formatting everything as text) using the following code:

Sub RemoveDuplicates()

Dim r As Range
Dim w As Worksheet

Set w = ActiveSheet
Set r = w.Range("A1:A100000")

r.AdvancedFilter Action:=xlFilterCopy, CopyToRange:=r.Offset(0, 1), Unique:=True

End Sub

The above code puts the unique values in column B, so you'd have to modify this to suit your needs. If you'd prefer the data remain in column A, you can create a temp sheet to put the unique values, delete your original data set, and then move the unique values back into the original sheet.

The above code assumes that you have a header for your dataset. I also don't know how well this will perform on large datasets... so you may need to do some testing to see if it will work for you.


Edit

I just tested this on 100K rows and it took ~50 seconds to finish... so I'm guessing this solution isn't viable. And I just saw you chose David's answer. :) I'll leave this up in case it helps someone else in the future.

Edit 2

I missed Ron's answer before I posted mine. We use the same function, but his answer has more functionality than mine.

Upvotes: 0

L42
L42

Reputation: 19727

You can have a go at this:

If your data size is <= 30k rows: Elapse time around 0.2 sec comparable to Excel's RemoveDuplicates

Dim arr As Variant, i As Long
'~~> pass range values to array
With SheetCodename '~~> Change to suit
    arr = Application.Transpose(.Range("A1", .Range("A" & .Rows.Count).End(xlUp)))
End With 
'~~> use Dictionary to remove dupes
With CreateObject("Scripting.Dictionary")
    For i = LBound(arr) To UBound(arr)
        .Item(CStr(arr(i))) = CStr(arr(i))
    Next
    SheetCodename.Range("A:A").ClearContents '~~> Clear source range
    '~~> Return unique items to range
    SheetCodename.Range("A1", "A" & .Count) = Application.Transpose(.Items)
End With

I tested this on your sample data and 17 unique values were returned.
For larger data sets though, this might fail due to Excel Memory shortcomings.

Edit1:
I got really interested to make this work in 100k rows and more so.
I then stumbled on THIS and below is what I've come up with.
Actual number of data tested: 168091

Dim rng As Range, cel As Range
Dim arr() As Variant, i As Long, key, start

start = Timer
With Sheet4
    Set rng = .Range("A1", .Range("A" & .Rows.Count).End(xlUp))
    rng.RemoveDuplicates 1, xlNo
End With
Debug.Print Timer - start '3.585938 sec

start = Timer
With Sheet2
    Set rng = .Range("A1", .Range("A" & .Rows.Count).End(xlUp))
End With
'~~> Use Dictionary to remove duplicates
With CreateObject("Scripting.Dictionary")
    '~~> need to loop through range since Array have limitations as well
    For Each cel In rng
        .Item(CStr(cel.Value2)) = CStr(cel.Value2)
    Next
    '~~> array limit workaround
    ReDim arr(.Count, 2): i = 0
    For Each key In .Keys
        arr(i, 0) = .Item(key)
        i = i + 1
    Next
    '~~> Return unique items to range
    Sheet2.Range("A:A").ClearContents
    Sheet2.Range("A1", "A" & .Count) = arr
End With
Debug.Print Timer - start '5.257813 sec

The result is the same with using Excels RemoveDuplicates (I mean the unique output).
There is a 1.671875 sec difference in performance but for me it is still manageable.

Upvotes: 0

David Zemens
David Zemens

Reputation: 53623

I didn't try to use the RemoveDuplicates method since it seems to not work for you anyways.

I use a dictionary object to do the dirty work and help ensure uniqueness. Based on the (apparent) success of this example, I'm not sure you need to worry about converting values to text. This iteration uses the values only, and then re-writes to the range. If you require additional formatting, please clarify :)

Sub Test()
Dim d As Object 'Scripting.Dictionary
                ' requires reference to Microsoft Scripting Runtime if you
                ' want to use early-binding
Dim rng As Range
Dim cl As Variant
Dim var As Variant

'#Define our range
Set rng = Range("A1:A22")
'#Store values in an array
var = rng.Value
'#Instantiate our dictioanry object
Set d = CreateObject("Scripting.Dictionary")
'#store unique vals in the dictionary
For Each cl In var
    d(cl) = cl
Next

'#Clear the original range
rng.Clear
'#Put the unique vals in to the range
rng.Resize(UBound(d.Keys) + 1).Value = Application.Transpose(d.Keys())

Set d = Nothing
End Sub

On the sample data, I end up with 17 unique values:

enter image description here

Upvotes: 3

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60199

I don't know why removeduplicates doesn't work. But I it doesn't work on your sample data. As a "workaround" I would suggest trying the Advanced Filter. The only drawback will be that it will always consider the first row as a header, so you may need to compensate for that. Here is a routine that works on your sample data. I chose to copy to a new destination, then overwrite the original, but you may wish to use a different scheme.

Also, if it works for you, you may want to disable screen updating while the macro is running.

By the way, the routine works with General formatting and mixed numeric and text data also. Probably no need to convert everything to text.

Sub RemDups()
    Dim R As Range
    Dim rDest As Range

Set R = Range("a1", Cells(Rows.Count, "A").End(xlUp))

Set rDest = Range("D1")
rDest.EntireColumn.Clear

R.AdvancedFilter xlFilterCopy, , rDest, True

R.EntireColumn.Clear
Set rDest = Range(rDest, Cells(Rows.Count, rDest.Column).End(xlUp))
rDest.Copy R(1)
rDest.Clear

End Sub

Upvotes: 0

Taliesin
Taliesin

Reputation: 414

I think there's an issue with your use of the RemoveDuplicates syntax.

Try:

rge.RemoveDuplicates Columns:=Array(1), Header:=xlNo

I would recommend running the other code to first standardize the formats as text though. This syntax worked fine for me.

Upvotes: 0

Taliesin
Taliesin

Reputation: 414

I believe this will provide the result you are looking for. Insert this function into VBA editor.

Public Sub ConvertToText()

    Dim c As Range
    Dim a As Areas
    Dim v As Variant

    Set c = Selection
    Set a = c.Areas

    If a.Count > 1 Then
        ' IF DESIRED YOU CAN EXTEND THE LOGIC FOR MULTIPLE AREAS | CURRENT FUNCTION DOES NOT SUPPORT
        MsgBox "Select one continuous range.", vbCritical, "Error"
        Exit Sub
    End If

    v = WorksheetFunction.Transpose(WorksheetFunction.Transpose(c.Value))

    c.Clear
    c.NumberFormat = "@"
    c = v

End Sub

Upvotes: 0

MatthewHagemann
MatthewHagemann

Reputation: 1195

Selection.NumberFormat = "@"

I would think rge.NumberFormat = "@" would work

Upvotes: 0

Related Questions