yair
yair

Reputation: 29

Identify and extract noun and modifier

Any idea how to Identify and extract noun and modifier using VBA (excel)

Example:

ball valve 2in for green pump with gasket

Should be: ball valve

Any help will be appreciated

Upvotes: 1

Views: 3354

Answers (1)

dnep
dnep

Reputation: 562

There are some different approaches, depending on the type of sentence you expect. In your example, the two words you want to extract are on the beginning of the sentence, and separated by whitespaces. If you expect this to be always the case, then you could use something simple as

Function getNoun(ByVal sentence As String)
    getNoun = ""
    pos1 = InStr(1, sentence, " ") 'find the first whitespace
    If pos1 <= 0 Then
        getNoun = sentence 'if no whitespace, then assume there is only the noun
        Exit Function
    End If
    pos2 = InStr(pos1 + 1, sentence, " ") 'find the second whitespace
    If pos2 <= 0 Then
        getNoun = sentence 'if no second whitespace, then assume there is only the noun and qualifier
        Exit Function
    End If

    getNoun = Left(sentence, pos2 - 1) 'if there are two or more spaces, get all chars before the second one
End Function

Tests in immediate window:

? getNoun("ball valve 2in for green pump with gasket")
ball valve
? getNoun("ball valve")
ball valve
? getNoun("ball")
ball

If your scenario is more complex and you need to use specific criteria to determine which words are the desired noun and qualifier, you would probably find use for the Regex COM class (see this topic for example).

EDIT: Based on the comments, I understand that positions are variable, and that it is acceptable to use the MS Word thesaurus as a reference. If the code will run in Microsoft Word, the following function will tell you whether or not a word is a noun:

 Function is_noun(ByVal wrd As String)
  Dim s As Object, l As Variant
  is_noun = False
  Set s = SynonymInfo(wrd)
  Let l = s.PartOfSpeechList
  If s.MeaningCount <> 0 Then
      For i = LBound(l) To UBound(l)
          If l(i) = wdNoun Then
              is_noun = True
          End If
      Next
  End If
End Function

If you are not running on MS Word (your tags suggest MS Excel) but MS Word is installed in the target system, then you can adapt the above code to use MS Word COM automation object.

Then you can extract the first noun, and the next word - if any -, from a sentence, with something like this

Function getNoun(ByVal sentence As String)
   getNoun = ""
   Dim wrds() As String
   wrds = Split(sentence)
   For i = LBound(wrds) To UBound(wrds)
        If is_noun(wrds(i)) Then
            getNoun = wrds(i)
            If i < UBound(wrds) Then
                getNoun = getNoun & " " & wrds(i + 1)
            End If
            Exit Function
        End If
    Next
End Function

Notice, however, that with this you are trusting blindly in MS Word's word database and may get weird results if your sentences contain, for example, words that may be a verb or a noun depending on context. Also, the above example will use the default language of your setup of MS Word (it is possible to use a different one - if installed - by including a language parameter in SynonymInfo)

Upvotes: 1

Related Questions