Reputation: 23
I was handled a very long excel file (up to 11000 rows and 7 columns) that has many repeated data inside a cell. I am looking for a macro to get rid of it but couldn't find any.
Example of one such cells:
Ciencias de la Educación,Educación,Pedagogía,Ciencias de la Educación,Educación,Pedagogía
It should look like:
Ciencias de la Educación,Educación,Pedagogía
How can I get rid of the thousands of repeats (not to mention the extra, orphaned, commas)?
Upvotes: 2
Views: 8498
Reputation: 35853
This code runs 6 seconds on my machine and 2 seconds on @SiddharthRout's machine:)
(with data in cells A1:G20000
: 20000x7=140000 non empty cells)
Sub test2()
Dim c, arr, el, data, it
Dim start As Date
Dim targetRange As Range
Dim dict As Object
Set dict = CreateObject("Scripting.dictionary")
Application.ScreenUpdating = False
Set targetRange = Range("A1:G20000")
data = targetRange
start = Now
For i = LBound(data) To UBound(data)
For j = LBound(data, 2) To UBound(data, 2)
c = data(i, j)
dict.RemoveAll
arr = Split(c, ",")
For Each el In arr
On Error Resume Next
dict.Add Trim(el), Trim(el)
On Error GoTo 0
Next
c = ""
For Each it In dict.Items
c = c & it & ","
Next
If c <> "" Then c = Left(c, Len(c) - 1)
data(i, j) = c
Next j
Next i
targetRange = data
Application.ScreenUpdating = True
MsgBox "Working time: " & Format(Now - start, "hh:nn:ss")
End Sub
You can make this code slightly faster by changing next two lines
Dim dict As Object
Set dict = CreateObject("Scripting.dictionary")
to
Dim dict As new Dictionary
after adding reference to library: go to Tools->References and select "Microsoft Scripting Runtime"
Upvotes: 1
Reputation: 149315
Here is a basic example
Sub Sample()
Dim sString As String
Dim MyAr As Variant
Dim Col As New Collection
Dim itm
sString = "Ciencias de la Educación,Educación,Pedagogía,Ciencias de la Educación,Educación,Pedagogía"
MyAr = Split(sString, ",")
For i = LBound(MyAr) To UBound(MyAr)
On Error Resume Next
Col.Add Trim(MyAr(i)), CStr(Trim(MyAr(i)))
On Error GoTo 0
Next i
sString = ""
For Each itm In Col
sString = sString & "," & itm
Next
sString = Mid(sString, 2)
Debug.Print sString
End Sub
EDIT
Tried and tested in Excel 2010 with A1:G20000
filled with Ciencias de la Educación,Educación,Pedagogía,Ciencias de la Educación,Educación,Pedagogía
Time Taken: 2 Seconds
Code
Sub Sample()
Dim sString As String
Dim MyAr As Variant, rngAr
Dim Col As New Collection
Dim itm
Dim rng As Range
Debug.Print "StartTime: " & Now
Set rng = ThisWorkbook.Sheets("Sheet1").Range("A1:G20000")
rngAr = rng.Value
For i = LBound(rngAr) To UBound(rngAr)
For j = LBound(rngAr, 2) To UBound(rngAr, 2)
MyAr = Split(rngAr(i, j), ",")
For k = LBound(MyAr) To UBound(MyAr)
On Error Resume Next
Col.Add Trim(MyAr(k)), CStr(Trim(MyAr(k)))
On Error GoTo 0
Next k
sString = ""
For Each itm In Col
sString = sString & "," & itm
Next
sString = Mid(sString, 2)
rngAr(i, j) = sString
Next j
Next i
ThisWorkbook.Sheets("Sheet1").Range("A1").Resize(20000, 7).Value = rngAr
Debug.Print "EndTime: " & Now
End Sub
ScreenShot
Upvotes: 1