Jason
Jason

Reputation: 4772

Strip leading/trailing spaces and commas in Excel

This seems like such a simple requirement, that I feel like I am missing something obvious.

I have an Excel spreadsheet with "dirty" text data in, containing text and unwanted leading and trailing, spaces, commas and newlines. I would like to TRIM references to these cells of all those characters.

Note: I don't want to replace all those characters, since they legitimately appear within the cell text - it is just when at the start or end of the cell text (i.e. value) that I want to trim them off.

The text data consists of names of people and schools, for cleaning and importing into a CRM.

So, is there a function built in, or do I need to write one? I feel spoiled by the number of string filtering functions in PHP ;-)

Upvotes: 2

Views: 9167

Answers (4)

Rohan Khude
Rohan Khude

Reputation: 4903

I tried this using two steps

  1. By removing spaces
  2. By removing comma

For removing leading and trailing spaces

Use direct function TRIM(A1)

For removing leading and trailing comma

=MID(A1,IF(FIND(",",A1)=1,2,1),IF(RIGHT(A1)=",",LEN(A1)-2,LEN(A1)))

or

=SUBSTITUTE(TRIM(SUBSTITUTE(A1,","," "))," ",",")

Upvotes: 0

Jason
Jason

Reputation: 4772

I have found this code, which I pasted in as a module into my spreadsheet:

Option Explicit

Function ReReplace(ReplaceIn, _
    ReplaceWhat As String, ReplaceWith As String, Optional IgnoreCase As Boolean = False)

    Dim RE As Object
    Set RE = CreateObject("vbscript.regexp")
    RE.IgnoreCase = IgnoreCase
    RE.Pattern = ReplaceWhat
    RE.Global = True
    ReReplace = RE.Replace(ReplaceIn, ReplaceWith)
End Function

This provides a replace function that supports REs (why doesn't Excel do that itself? It has only been around since 1987 - I had it on my Atari ST, note that you can add more than ten cells before it crashed!). This cell function is able to do the trimming I need:

=ReReplace('source worksheet'!cell_reference, "^[\s,]+|[\s,]+$", "")

This works beautifully.

(Note: this answer moved from the question text, where it really should not have been.)

Upvotes: 1

AT_
AT_

Reputation: 11

Recursive function to remove comma and trailing spaces. Pure VBA..

Function removetrailcomma(txt As String) As String
    If Right(txt, 1) = " " Or Right(txt, 1) = "," Then
        removetrailcomma = removetrailcomma(Left(txt, Len(txt) - 1))
    Else
        removetrailcomma = txt
    End If
End Function

Upvotes: 0

brettdj
brettdj

Reputation: 55702

This is well suited to a regexp

The code below adapted from this article uses this regexp
"[,\s]*(.+?)[,\s]*$"
to remove any leading and/or trailing whitespaces/commas while leaving any such characters within the text body intact

It will replace your existing data in-situ

Sub RemoveDirt()
Dim rng1 As Range
Dim rngArea As Range
Dim lngRow As Long
Dim lngCol As Long
Dim lngCalc As Long
Dim objReg As Object
Dim X()


On Error Resume Next
Set rng1 = Application.InputBox("Select range for the replacement of leading zeros", "User select", Selection.Address, , , , , 8)
If rng1 Is Nothing Then Exit Sub
On Error GoTo 0

'See Patrick Matthews excellent article on using Regular Expressions with VBA
Set objReg = CreateObject("vbscript.regexp")
objReg.MultiLine = True
objReg.Pattern = "[,\s]*(.+?)[,\s]*$"

'Speed up the code by turning off screenupdating and setting calculation to manual

'Disable any code events that may occur when writing to cells
With Application
    lngCalc = .Calculation
    .ScreenUpdating = False
    .Calculation = xlCalculationManual
    .EnableEvents = False
End With

'Test each area in the user selected range

'Non contiguous range areas are common when using SpecialCells to define specific cell types to work on
For Each rngArea In rng1.Areas
    'The most common outcome is used for the True outcome to optimise code speed
    If rngArea.Cells.Count > 1 Then
       'If there is more than once cell then set the variant array to the dimensions of the range area
       'Using Value2 provides a useful speed improvement over Value. On my testing it was 2% on blank cells, up to 10% on non-blanks
        X = rngArea.Value2
        For lngRow = 1 To rngArea.Rows.Count
            For lngCol = 1 To rngArea.Columns.Count
                'replace the leading zeroes
                X(lngRow, lngCol) = objReg.Replace(X(lngRow, lngCol), "$1")
            Next lngCol
        Next lngRow
        'Dump the updated array sans dirt over the initial range
        rngArea.Value2 = X
    Else
        'caters for a single cell range area. No variant array required
        rngArea.Value = objReg.Replace(rngArea.Value, "$1")
    End If
Next rngArea

'cleanup the Application settings
With Application
    .ScreenUpdating = True
    .Calculation = lngCalc
    .EnableEvents = True
End With

Set objReg = Nothing
End Sub

Upvotes: 2

Related Questions