NLP POS tagging

Question

I am working on a chatbot project using NLP. I am using spacy and I want to get pos of the tokens in sentence. Currently I am using this code

en = spacy.load("en_core_web_md")
pos_sent = "lib/lzma.py this module provides classes and convenience functions for compressing and decompressing data using the lzma compression algorithm."
pos_sent = en(pos_sent)
for token in pos_sent:
  print(token, token.pos_)

But this tends to split tokens on symbols also which I don't want. For example, this treats "lib", "/", "lizma.py" as seperate tokens. But in the orginal sentence it is 1 whole word. Is there some way in which I can get the POS of the word without it being split on symbols ?

NLP POS tagging

Answers (1)

Related Questions