michal roesler
michal roesler

Reputation: 499

Extract Uppercase Words on Excel Function

I have supplier name together with product name in one cell as a string.
Each cell has a word that's all uppercase (sometimes with a digit or a number).

enter image description here

Data I need to extract
3LAB Anti - Aging Oil 30ml 3LAB
3LAB Aqua BB SPF40 #1 14g 3LAB
3LAB SAMPLE Perfect Neck Cream 6ml 3LAB
3LAB SAMPLE Super h" Serum Super Age-Defying Serum 3ml" 3LAB
3LAB TTTTT Perfect Mask Lifting Firming Brightening 28ml 3LAB
3LAB The Cream 50ml 3LAB
3LAB The Serum 40ml 3LAB
4711 Acqua Colonia Intense Floral Fields Of Ireland EDC spray 170ml EDC
4711 Acqua Colonia Intense Pure Brezze Of Himalaya EDC spray 50m" EDC

I need to extract only that UPPERCASE supplier name to a new cell.
I've tried to create User Defined Function like this one, but it's not working.
It's returning #NAME? error.

Public Function UpperCaseWords(S As String) As String
  Dim X As Long
  Dim TempText As String
  
  TempText = " " & S & " "   
  
  For X = 2 To Len(TempText) - 1
    If Mid(TempText, X, 1) Like "[!A-Z ]" Or Mid(TempText, X - 1, 3) Like "[!A-Z][A-Z][!A-Z]" Then
      Mid(TempText, X) = " "
    End If
  Next
  UpperCaseWords = Application.Trim(TempText)
End Function

Any idea how to correct it and make it work?

I've found it here: https://www.mrexcel.com/board/threads/formula-to-extract-upper-case-words-in-a-text-string.684934/page-2#posts

And why in this macro, in line For X = 2 To Len(TempText) - 1 the X is set to 2?

Upvotes: 1

Views: 1010

Answers (2)

JvdV
JvdV

Reputation: 75860

Instead of a custom made UDF, try to utilize what Excel does offer through build-in functionality, for examle FILTERXML():

enter image description here

Formula used in B1:

=FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.*0!=0][translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789','')='']")

If you are using Microsoft365, add an @ before the the function e.g: =@FILTERXML()....., or add [1] as the 3rd xpath-expression to tell the function to only return the 1st node that complied against the previous two rules we used.

Let's have an analysis on the formula:

  • FILTERXML() - We can utilize this function to "split" a string on a particular delimiter, a space in this case.
  • "<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>" - Here is the part where we create a parent-child structure of start/end-tags; a valid XML-string.
  • //s - In the 2nd parameter of FILTERXML() we start a valid xpath-expressions. We want all the s-nodes (childres from t-nodes) that comply to the following two rules:
    • [.*0!=0] - Select all nodes that when multiplied with zero are not the same as zero. Meaning we don't want pure numeric substrings returned.
    • [translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789','')=''] - When we translate() all the characters mentioned in the 2nd parameter of this function to nothing, the result should also be nothing, meaning the node is only made out of uppercase alpha chars and numbers.

A more in-depth explaination on how FILTERXML() works when you want to extract a particular substring can be found here. Happy coding!

Upvotes: 3

Alex K.
Alex K.

Reputation: 175776

How about: split the text into words, for each word ignore it if its numeric & return it if its the same as the upper case version of itself:

Public Function UpperCaseWords(S As String) As String
    Dim i As Long
    Dim word As String
    Dim words() As String
    
    words = Split(S, " ")
    
    For i = 0 To UBound(words)
        word = words(i)
        If Not IsNumeric(word) And word = UCase$(word) Then
            UpperCaseWords = word
            Exit Function
        End If
    Next
End Function

Upvotes: 1

Related Questions