sten
sten

Reputation: 7476

Spacy: Creating empty or error Token instance?

I'm collecting some of the tokens in a Dict for further use. The problem is that I need one token to play the role of None/NIL in case I don't find what I need in the doc to act as the no-value case i.e. still have all the attributes (the string value could be say some special char) ... i.e. act like Token, but not be a token from the doc.

Is there a way to create such Token ? Or may be copy some but modify .dep_, .pos_ etc.

Upvotes: 2

Views: 911

Answers (2)

Armary
Armary

Reputation: 1

I got the same issue with creating an empty Token, with no information attached (no text, no children, no dependence, etc), but which called later on by a spacy module, to avoid crashing the code when iterating over several token if one of the iteration doesn't find a token.

I solved the issue by creating an instance of the Token with index -1 from a given text:

    text = "This is a very simple text."
    doc = nlp(text)
    empty_token = spacy.tokens.token.Token(nlp.vocab, doc, -1)
    assert empty_token.text == ''
    assert [t for t in empty_token.children] == []

empty_token is a normal token, it has all the attributes but set with empty value. Running the following code produces empty outputs:

print('text:', empty_token.text)
print('pos:', empty_token.pos_)
print('dep:', empty_token.dep_)

# Output
# text:
# pos:
# dep:

Upvotes: 0

Joel Oduro-Afriyie
Joel Oduro-Afriyie

Reputation: 2023

One approach would be to create a doc containing just one character (e.g. a space or * or any other character of your choice), i.e. nlp('<special character>'), and take the token at index 0. For example, if your special character is a #, this would look like:

empty_token = nlp('#')[0]

empty_token is a normal token, so it has all the attributes. Running the code below produces the corresponding output:

print('text:', empty_token.text)
print('pos:', empty_token.pos_)
print('dep:', empty_token.dep_)

Output:

text: #
pos: SYM
dep: ROOT

Upvotes: 1

Related Questions