Reputation: 7476
I'm collecting some of the tokens in a Dict for further use. The problem is that I need one token to play the role of None/NIL in case I don't find what I need in the doc to act as the no-value case i.e. still have all the attributes (the string value could be say some special char) ... i.e. act like Token, but not be a token from the doc.
Is there a way to create such Token ? Or may be copy some but modify .dep_, .pos_ etc.
Upvotes: 2
Views: 911
Reputation: 1
I got the same issue with creating an empty Token, with no information attached (no text, no children, no dependence, etc), but which called later on by a spacy module, to avoid crashing the code when iterating over several token if one of the iteration doesn't find a token.
I solved the issue by creating an instance of the Token with index -1 from a given text:
text = "This is a very simple text."
doc = nlp(text)
empty_token = spacy.tokens.token.Token(nlp.vocab, doc, -1)
assert empty_token.text == ''
assert [t for t in empty_token.children] == []
empty_token
is a normal token, it has all the attributes but set with empty value. Running the following code produces empty outputs:
print('text:', empty_token.text)
print('pos:', empty_token.pos_)
print('dep:', empty_token.dep_)
# Output
# text:
# pos:
# dep:
Upvotes: 0
Reputation: 2023
One approach would be to create a doc containing just one character (e.g. a space or * or any other character of your choice), i.e. nlp('<special character>'), and take the token at index 0. For example, if your special character is a #, this would look like:
empty_token = nlp('#')[0]
empty_token
is a normal token, so it has all the attributes. Running the code below produces the corresponding output:
print('text:', empty_token.text)
print('pos:', empty_token.pos_)
print('dep:', empty_token.dep_)
Output:
text: #
pos: SYM
dep: ROOT
Upvotes: 1