Reputation: 29
Any idea how to Identify and extract noun and modifier using VBA (excel)
Example:
ball valve 2in for green pump with gasket
Should be: ball valve
Any help will be appreciated
Upvotes: 1
Views: 3354
Reputation: 562
There are some different approaches, depending on the type of sentence you expect. In your example, the two words you want to extract are on the beginning of the sentence, and separated by whitespaces. If you expect this to be always the case, then you could use something simple as
Function getNoun(ByVal sentence As String)
getNoun = ""
pos1 = InStr(1, sentence, " ") 'find the first whitespace
If pos1 <= 0 Then
getNoun = sentence 'if no whitespace, then assume there is only the noun
Exit Function
End If
pos2 = InStr(pos1 + 1, sentence, " ") 'find the second whitespace
If pos2 <= 0 Then
getNoun = sentence 'if no second whitespace, then assume there is only the noun and qualifier
Exit Function
End If
getNoun = Left(sentence, pos2 - 1) 'if there are two or more spaces, get all chars before the second one
End Function
Tests in immediate window:
? getNoun("ball valve 2in for green pump with gasket")
ball valve
? getNoun("ball valve")
ball valve
? getNoun("ball")
ball
If your scenario is more complex and you need to use specific criteria to determine which words are the desired noun and qualifier, you would probably find use for the Regex COM class (see this topic for example).
EDIT: Based on the comments, I understand that positions are variable, and that it is acceptable to use the MS Word thesaurus as a reference. If the code will run in Microsoft Word, the following function will tell you whether or not a word is a noun:
Function is_noun(ByVal wrd As String)
Dim s As Object, l As Variant
is_noun = False
Set s = SynonymInfo(wrd)
Let l = s.PartOfSpeechList
If s.MeaningCount <> 0 Then
For i = LBound(l) To UBound(l)
If l(i) = wdNoun Then
is_noun = True
End If
Next
End If
End Function
If you are not running on MS Word (your tags suggest MS Excel) but MS Word is installed in the target system, then you can adapt the above code to use MS Word COM automation object.
Then you can extract the first noun, and the next word - if any -, from a sentence, with something like this
Function getNoun(ByVal sentence As String)
getNoun = ""
Dim wrds() As String
wrds = Split(sentence)
For i = LBound(wrds) To UBound(wrds)
If is_noun(wrds(i)) Then
getNoun = wrds(i)
If i < UBound(wrds) Then
getNoun = getNoun & " " & wrds(i + 1)
End If
Exit Function
End If
Next
End Function
Notice, however, that with this you are trusting blindly in MS Word's word database and may get weird results if your sentences contain, for example, words that may be a verb or a noun depending on context. Also, the above example will use the default language of your setup of MS Word (it is possible to use a different one - if installed - by including a language parameter in SynonymInfo
)
Upvotes: 1