Reputation: 19
I am looking to split a cell after the first instance of a word with multiple caps.
Example 1:
Input1: Floor 4 InformatiqueNoosavilleSep
Desired Output1: NoosavilleSep
Pattern: The split should occurs at the second instance of uppercase in the last word. "InformatiqueNoosavilleSep"
Example 2:
Input: Floor 13 InformatiqueSurfers ParadiseSep
Output: Surfers ParadiseSep
Pattern: The split shouldn't occurs at the last word but on "InformatiqueSurfers" instead.
The problem: The pattern to find where to split the word differs from a cell to another.
What we know:
1: If the last word contains three uppercase letter it's always at this word that we want to split the string.Example1
2: If the last word contains only two uppercase letter "ParadiseSep" we have to split the string on the the word before it. Example2
I found this code which allow to split string with caps lock and add a spaces.
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
As I am trying to learn VBA this Regex is getting a bit out of my realm.
Thanks for reading!
Upvotes: 1
Views: 1278
Reputation: 643
You are looking for the first instance of a word with multiple caps. Let's build a regular expression to find that as a first step.
You are looking for uppercase letters: [A-Z]
They must be in the same word, so between the two uppercase letters you can only have lowercase letters: [a-z]
There can be zero or more lowercase letters between the 2 uppercase letters: [A-Z][a-z]*[A-Z]
Anything after this occurrence should be part of the result, so we are also interest in the remainder of this string. This can be zero or more occurrences of any character: [A-Z][a-z]*[A-Z].*
Before this occurrence, we also can have zero or more occurrences of any character. But we want to find the first instance, so we will have to make that part of our regular expression "non-greedy" by appending a ?
to it: .*?[A-Z][a-z]*[A-Z].*
Now we have completed a regular expression to find what you were looking for.
Second step: You want to split the word starting at the second uppercase letter within the found pattern. So we put parentheses (round brackets) starting just before that uppercase letter and after the end of the expression: .*?[A-Z][a-z]*([A-Z].*)
Third step: You want to retrieve the part of the string which you isolated in the second step using the parentheses. We do that by using $1
, meaning the contents of the first set of parentheses in the regular expression: .Replace(inputString, "$1")
If we put this in a vba function, we would get something like:
Function SplitMultipleCaps(inputString As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = False
.Pattern = ".*?[A-Z][a-z]*([A-Z].*)"
SplitMultipleCaps = .Replace(inputString, "$1")
End With
End Function
Note: I put .Global = False
because you only want to do it once. The function might also work with .Global = True
, but I felt that False
was more appropriate in this case.
Upvotes: 3
Reputation: 679
Try this:
Function afterFirstUpperCaseWord(strIn As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "\w+ \d+ [A-Z][a-z]+ ?(.*)"
afterFirstUpperCaseWord = .Replace(strIn, "$1")
End With
End Function
Upvotes: 0