BruceWayne
BruceWayne

Reputation: 23283

Removing Numbers and Errant Letters from string

I'm trying to parse some information down (with VBA, in Excel), removing numbers and "associated letters". Unfortunately, there's not much of a straightforward pattern I can think of so far as programming goes. However, "looking at it", I can see where I'd want to cut the data off.

Here's a mock list of info (each line is a single cell):

Maersk Bentonville voy. 1283
CRC Brooklyn 183E
Piraeus 298ER20
Seago Istanbul - voy. 182
Maersk Bulan YH828
Buton 164NN
Seago Bremerhaven One
Seago Antwerp 149E
Captain Maersk NY8821
Carsten Maersk OC9192
Cecilie Vessel 12N3R
Charlotte Maersk 1290

Desired Output:

Maersk Bentonville
CRC Brooklyn
Piraeus
Seago Istanbul
Maersk Bulan
Buton
Seago Bremerhaven One
Seago Antwerp
Captain Maersk
Carsten Maersk
Cecilie Vessel
Charlotte Maersk

I'm just trying to get the name (i.e. Maersk Bentonville). There's not a master list I have anywhere to compare to, so I just want to remove "voy." and the number/letter group, i.e. 183E, 298ER20, etc.

I'm able to get pretty close with the below function:

Function RemoveNumbers(Txt As String) As String
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "[0-9]|[-+./]+|(voy)"
    RemoveNumbers = .Replace(Txt, "")
End With
End Function

But you'll notice it leaves me with Maersk Brooklyn E, Maersk Bulan YH.

I suppose my question is: Is there a way to see if that group with numbers has letters in it? And if so, just remove that group?

I'm sure there's a regex pattern, but I'm still learning regex so haven't figured it out, but would love to find one.

If there are any other options, I'm open to that too. Of course, I can loop through each character in the cell, and note when a number appears next to a letter, then start tracking until the next space, then remove that group, but that seems pretty inefficient.

Thanks for any ideas, let me know if I can clarify anything!

PS: I can't just "cut" the last group of text. Sometimes I have a cell with three names I want to keep (Seago Bremerhaven One), and sometimes I just have two parts Maersk Ship, Seago Antwerp, or Seago 29E2R.

Upvotes: 3

Views: 434

Answers (1)

Rahul
Rahul

Reputation: 2738

Using the anchor ^ at beginning of string you can extract names.

Regex: ^(?:[A-Za-z]+\s)* with multi-line option ON

Explanation:

^ will start matching at beginning of line. (?:[A-Za-z]+\s)* will look for combination of characters (assuming names don't have numbers in it) followed by a whitespace. This pattern will be search for multiple times since * quantifier is used. Thus matching more than one name in a line.

Regex101 Demo

Upvotes: 3

Related Questions