Chris
Chris

Reputation: 99

Removing unwanted characters from in-front and behind cells in VBA (excel)

I’m very new to programming and although there are several similar questions to mine that have been asked, I can't seem to get them working for my needs.

What I want is to be able to copy raw data into column A, hit run on the macro and it should remove any unwanted characters both before and after the data that I want to keep resulting in a cell just containing the data that I want. I also want it to go through all cells that are in the column, bearing in mind some cells may be empty.

The data that I want to keep is in this format: L1-somedata-0000

The -somedata- text will change but the - ether side will always be there, the L1 will sometimes be L2, and the 0000 (which could be any 4 numbers) will sometimes be any 3 numbers. also there may be some rows in the column that have no useful data, these should be removed. Finally, some cells will not contain any unwanted data, these should stay the same.

Sub Test()
Dim c As Range
For Each c In Range("A2:A" & Range("A" & Rows.Count).End(xlUp).Row)
    c = removeData(c.text)
Next
End Sub

Function removeData(ByVal txt As String) As String
Dim result As String
Dim allMatches As Object
Dim RE As Object

Set RE = CreateObject("vbscript.regexp")

RE.Pattern = "(L1-somedata-\d{4}|\d{3})"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)

If allMatches.Count <> 0 Then
    result = allMatches.Item(0).submatches.Item(0)
End If

ExtractSDI = result

End Function

I have put my code that I've got so far, all it does is go through each cell, if it matches it just removes the text that I want to keep as well as the stuff that I want removed!

I really hope all of that makes sence! Any help will be much appreciated.

Chris

Upvotes: 4

Views: 1303

Answers (2)

Ekkehard.Horner
Ekkehard.Horner

Reputation: 38745

If the "-" are part of the input data, you could use a RegExp Replace like:

>> Set r1 = New RegExp
>> r1.Pattern = "^[^-]+(-[^-]+-).*"
>> WScript.Echo r1.Replace("L2-A-1234", "$1")
>>
-A-

or:

>> Set r1 = New RegExp
>> r1.Pattern = "^[^-]+-([^-]+).*"
>> WScript.Echo r1.Replace("L2-B-123", "$1")
>>
B

Instead of .Replace, you can use Submatches too:

>> WScript.Echo r1.Execute("Don't care-wanted-")(0).SubMatches(0)
>>
wanted

If you need a function, pass the Regexp into the the function; and remember the return value must be assigned to the function name (removeData <> ExtractSDI).

Another possibility for the second spec ("-" not part of desired output):

>> WScript.Echo Split("Whatever-Wanted-Ignore", "-")(1)
>>
Wanted

UPDATE:

To deal with "-" embedded in the desired output and to show how this approach can be used in/as a formula:

Option Explicit

' needs Ref to RegExp

Dim rX As RegExp

Function cleanSDI(s)
  If rX Is Nothing Then
    Set rX = New RegExp
    rX.Pattern = "^([^-]*-)(.+)(-.*)$"
  End If
  cleanSDI = rX.Replace(s, "$2")
End Function

Depending on your data, you may have to change the .Pattern to

rX.Pattern = "^([^-]+-)(.+)(-.+)$"

to allow (*) / forbid (+) empty heads or tails. Use the Docs to work thru/understand the patterns.

Upvotes: 3

Siddharth Rout
Siddharth Rout

Reputation: 149295

You don't need VBA for this. If the data is in say Col A then put this formula in Cell B1 and copy it down.

=IF(AND(MID(A1,3,1)="-",MID(RIGHT(A1,5),1,1)="-"),MID(A1,4,LEN(A1)-8),IF(AND(MID(A1,3,1)="-",MID(RIGHT(A1,4),1,1)="-"),MID(A1,4,LEN(A1)-7),""))

Explanation:

  1. 4 is the length of L1- + 1 (from where we want to retrieve the string
  2. 8 is [3 + 5] which is the length of L1- and -0000
  3. 7 is [3 + 4] which is the length of L1- and -000

enter image description here

Upvotes: 1

Related Questions