Edgar Sampere
Edgar Sampere

Reputation: 273

How to parse Excel's cell string with regexp in VBA macro

I'm writing a macro in Excel 2010 in order to remove line breaks in multiple cells of a column. Actually, I have 5 cases for which I have to search within the cells. I enumerate them as follows:

  1. If there's a dot or a comma followed by a blank space and then a newline(\n), replace it with dot or comma( whatever it had before regexp applied) followed by a blank space
  2. If there's a character followed by blank space and then newline, replace it with character and blank space
  3. If there's a character followed by a newline, replace it with character + blank space
  4. If there's a character followed by two blank spaces and then a newline, replace it with character + blank space
  5. If there's a dot at the end of the line, leave it like that cause it's a finishing dot.

As you can see, 2 and 3 are really similar, so I think the regexp could be something like [a-zA-Z0-9]\n but I don't know...first, if it's correct to search for a newline just adding \n and asecond, how to search for a blank space. After detecting the regexp, what follows I think it could be solved with a single .Replace(Text,"regexp","regexp ") where the ending blank space is from the form "char" + " "
So basically my question is, what could be the regexp for this patterns? In the 5th case, how can I search for the line terminator so it does not try to search for a newline after the last dot of a parragraph.
Can I use Chr(10) for newline and Chr(32) for space?
Btw, I've been following these references:
How to use RegExp
VBA Split strings

Upvotes: 1

Views: 1395

Answers (1)

Bond
Bond

Reputation: 16311

This pattern will find any alphanumeric character, ., or , followed by optional whitespace and then a newline and replace it with the matched ending character, followed by one space.

Dim re
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "([\w.,])\s*\n"
strText = re.Replace(strText, "$1 ")

Output from testing. Ignore the brackets. They're just there to show spaces.

[the end.]     = [the end.]
[the end.\n]   = [the end. ]
[the end. \n]  = [the end. ]
[the end.  \n] = [the end. ]
[the end,]     = [the end,]
[the end,\n]   = [the end, ]
[the end, \n]  = [the end, ]
[the end,  \n] = [the end, ]
[the end]      = [the end]
[the end\n]    = [the end ]
[the end \n]   = [the end ]
[the end  \n]  = [the end ]

Upvotes: 3

Related Questions