Reputation: 4632
I came across a situation where i have to get the pos_ & tag_ from spacy doc objects.
For example,
text = "Australian striker John hits century"
doc = nlp(text)
for nc in doc.noun_chunks:
print(nc) #Australian striker John
doc[1].tag_ # gives for striker
if I want to get pos_
& tag_
for word 'striker' do I need to again give that sentence to nlp()
??
Also doc[1].tag_ is there, but I need something like doc['striker'].tag_ ..
Is there any possibility ?
Upvotes: 3
Views: 4950
Reputation: 3096
You only have to process the text once:
text = "Australian striker John hits century"
doc = nlp(text)
for nc in doc.noun_chunks:
print(nc)
print([(token.text, token.tag_, token.pos_) for token in nc])
If you only want to get a specific word within the noun chunk, you can further filter this by changing the second print statement to e.g.
print([(token.text, token.tag_, token.pos_) for token in nc if token.tag_ == 'NN'])
Note that this may print multiple hits, depending on your model & input sentence.
Upvotes: 2
Reputation: 625
You can do the following:
text = "Australian striker John hits century"
x1 = "striker"
x2 = re.compile(x1,re.IGNORECASE | re.VERBOSE)
loc_indexes = [m.start(0) for m in re.finditer(x2, text )]
tag = [i.tag_ for i in nlp(text) if i.idx in loc_indexes ]
print(x1,tag[0])
it gives output:
striker NN
You can also easily make it dynamic if required with x1 being the variable.
Upvotes: 0