eirikdaude
eirikdaude

Reputation: 3254

Replacing submatches

I have a list of values in Excel, looking something like this:

J0-315 Sirkulasjonspumpe So2 
J0-318 Sirkulasjonspumpe So2 
Sirkulasjonspumpe J0-321
Sirkulasjonspumpe So2 J0-324
Sirkulasjonspumpe So2 J0-327
Sirkulasjonspumpe So2 J0-330
S1-179 Spjeld Ut Fra Pumpe 
S1-187 Spjeld Ut Fra Pumpe 
Spjeld Ut Fra Pumpe S1-195
Spjeld Ut Fra Pumpe S1-203
Spjeld Ut Fra Pumpe S1-211
Spjeld Ut Fra Pumpe S1-219
Ventil Inn Y227
S1-181 Spjeld Ut Fra Pumpe 
S1-189 Spjeld Ut Fra Pumpe 
Spjeld Ut Fra Pumpe S1-197
Spjeld Ut Fra Pumpe S1-205
Spjeld Ut Fra Pumpe S1-213
Spjeld Ut Fra Pumpe S1-221
Ventil Mot Tankfarm Y234
Ventil Ut Y225

Now, I want to have each word except the first one of each line changed to lowercase, unless it is a machine number (e.g. J0-315, Y225). Ideally I'd have chemical compounds (SO2) changed to uppercase as well, but there are few enough instances of this that I'll just do that by hand afterwards :-p

Now, I've come up with a macro which prints out all the values I want to have changed, but I have trouble figuring out how to just change the submatches to lowercase, and some cursory googling fails to give me any insights as well.

Can someone here give me a clue on how to solve this problem?

The code I have come up with so far can be found below:

Sub dlgho()
    Dim c As Range
    Dim regex As New RegExp
    Dim matches As MatchCollection
    Dim m As Match
    Dim i As Long
    
    With regex
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "([A-ZÆØÅ])[^0-9\-]"
    End With
    
    For Each c In Sheet8.Range("A1:A896")
        If regex.Test(c) Then
            Set matches = regex.Execute(c)
            If matches.Count > 1 Then
                For i = 1 To matches.Count - 1
                    Set m = matches(i)
                    Debug.Print m
                    Debug.Print m.SubMatches(0)
                Next i
            End If
        End If
    Next c
End Sub

To be clear, I want the characters printed in the second print-statement to be changed to lowercase, but I am uncertain about how to go about this.

The expected output would be

J0-315 sirkulasjonspumpe so2 
J0-318 sirkulasjonspumpe so2 
Sirkulasjonspumpe J0-321
Sirkulasjonspumpe so2 J0-324
Sirkulasjonspumpe so2 J0-327
Sirkulasjonspumpe so2 J0-330
S1-179 spjeld ut fra pumpe 
S1-187 spjeld ut fra pumpe 
Spjeld ut fra pumpe S1-195
Spjeld ut fra pumpe S1-203
Spjeld ut fra pumpe S1-211
Spjeld ut fra pumpe S1-219
Ventil inn Y227
S1-181 spjeld ut fra pumpe 
S1-189 spjeld ut fra pumpe 
Spjeld ut fra pumpe S1-197
Spjeld ut fra pumpe S1-205
Spjeld ut fra pumpe S1-213
Spjeld ut fra pumpe S1-221
Ventil mot tankfarm Y234
Ventil ut Y225

Upvotes: 1

Views: 115

Answers (4)

T.M.
T.M.

Reputation: 9948

Assuming digit(s) as last string character and typical identifier for a machine number (or for a handful chemical formulae like SO2) would allow to code a simple non-regex alternative as follows:

  • start with lower case string (except 1st character)
  • split the input string into words (using blanks " " as delimiter)
  • identify machine numbers (or: SO2) via If IsNumeric(Right$(words(ii), 1)) changing them to Upper Case
  • rejoin all split tokens

Sub FollowingAlpha2LowerCase()
    Const Blank As String = " "
    '1) get data
    Dim ws As Worksheet: Set ws = Sheet1
    Dim lastRow As Long: lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
    Dim data
    data = ws.Range("A1:A" & lastRow).Value
    '2) Replace following alphabetic words to Lower Case
    Dim i As Long
    For i = 1 To UBound(data)
        'start with lower case string
        data(i, 1) = Left(data(i, 1), 1) & LCase(Mid(data(i, 1), 2))
        'isolate words
        Dim words: words = Split(data(i, 1), Blank)
        'analyze words and rebuild string
        Dim ii As Long
        For ii = 1 To UBound(words)
            If IsNumeric(Right$(words(ii), 1)) Then words(ii) = UCase(words(ii))
        Next
        data(i, 1) = Join(words, Blank)             ' assign new text
    Next i
    '3) write to target
    ws.Range("B1").Resize(UBound(data), 1) = data
End Sub

Results

Upvotes: 0

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60259

If you want to use Regex, here is a method.

  • Use a regex that does not involve a Capturing group
  • Use the .FirstIndex property to avoid the first word
  • I did not alter any words that had numbers included
  • I used the WorksheetFunction.Replace method to do the replacements
  • Note we use a similar technique to capitalize things like SO2

Edit to capitalize SO2 and similar

Option Explicit
Sub removeCaps()
    Dim RE As RegExp, MC As MatchCollection, M As Match
    Dim vData As Variant
    Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
    Dim I As Long
    

'set source and destination range/worksheets
Set wsSrc = Worksheets("Sheet14")
Set wsRes = Worksheets("Sheet14")
    Set rRes = wsRes.Cells(1, 3)
    
'read data into array for faster processing
With wsSrc
    vData = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With

Set RE = New RegExp
With RE
    .Global = True
    .IgnoreCase = False
    .Pattern = "[A-Z](?!\S*\d)"
    For I = 1 To UBound(vData, 1)
        If .Test(vData(I, 1)) = True Then
            Set MC = .Execute(vData(I, 1))
            For Each M In MC
                If M.FirstIndex <> 0 Then
                    vData(I, 1) = WorksheetFunction.Replace(vData(I, 1), M.FirstIndex + 1, 1, LCase(M))
                End If
            Next M
        End If
    Next I

'now capitalize chemical compound
    .Pattern = "\w*\d+\w*"
    For I = 1 To UBound(vData, 1)
        If .Test(vData(I, 1)) = True Then
            Set MC = .Execute(vData(I, 1))
                For Each M In MC
                    vData(I, 1) = WorksheetFunction.Replace(vData(I, 1), M.FirstIndex + 1, M.Length, UCase(M))
                Next M
        End If
    Next I
End With

With rRes.Resize(rowsize:=UBound(vData, 1))
    .EntireColumn.Clear
    .Value = vData
    .Style = "Output" 'this is NOT an internationally aware property
    .EntireColumn.AutoFit
End With
     
End Sub

enter image description here

Upvotes: 1

JvdV
JvdV

Reputation: 75870

If the common identifier for machine-numbers is that they end in three digits, you could use native functionality too:

enter image description here

Formula in B1:

=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"),Y,TEXTJOIN(" ",,IF(ISNUMBER(--RIGHT(X,3)),X,LOWER(X))),UPPER(LEFT(Y))&MID(Y,2,LEN(Y)))

Or:

=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"),Y,SEQUENCE(COUNTA(X)),TEXTJOIN(" ",,IF(Y>1,IF(ISNUMBER(--RIGHT(X,3)),X,LOWER(X)),X)))

And if one has access to REDUCE(), you could try:

=TRIM(REDUCE("",FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"),LAMBDA(a,b,a&" "&IF(OR(a="",ISNUMBER(--RIGHT(b,3))),b,LOWER(b)))))

If vba, and regex, is a must for you. maybe try something along these lines:

Sub Test()

Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet8")
Dim lr As Long, x As Long

lr = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row
arr = ws.Range("A1:A" & lr).Value

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = " [A-ZÆØÅ](?!\d)"
    For x = LBound(arr) To UBound(arr)
        If .Test(arr(x, 1)) Then
            Set matches = .Execute(arr(x, 1))
            For Each submatch In matches
                arr(x, 1) = Application.Replace(arr(x, 1), submatch.FirstIndex + 1, submatch.Length, LCase(submatch))
            Next
        End If
    Next
End With

ws.Range("B1:B" & lr).Value = Application.Index(arr, 0, 1)

End Sub

Upvotes: 2

Raymond Wu
Raymond Wu

Reputation: 3387

Non-regex method, editing cell 1 by 1 is also slow and inefficient so a faster way is to transfer the values of the range to an array and process the array instead.

This will also make SO2 as uppercase.

Full code below:

Sub dlgho()
    Const whiteList As String = "SO2"
    Const inputRange As String = "A1:A896"
    
    Dim inputArr As Variant
    inputArr = Sheet1.Range(inputRange).Value
    
    Dim wordArr() As String
    
    Dim i As Long
    Dim n As Long
    
    For n = 1 To UBound(inputArr)
                
        wordArr = Split(inputArr(n, 1), " ")
       
        For i = 1 To UBound(wordArr)
            If InStr(wordArr(i), "-") = 0 And _
                Not IsNumeric(Mid$(wordArr(i), 2)) Then
                
                wordArr(i) = LCase(wordArr(i))
            End If
            
            If UCase(wordArr(i)) = whiteList Then wordArr(i) = UCase(wordArr(i))
        Next i
        inputArr(n, 1) = Join(wordArr, " ")
    Next n
    
    Sheet1.Range(inputRange).Value = inputArr
End Sub

Upvotes: 2

Related Questions