pyoupyou
pyoupyou

Reputation: 55

Output text with specifically chosen tokens in parenthesis with Spacy

I want to print a sentence in my terminal with some specific words in curly parenthesis. For instance if I want the word in 5th and 7th position of this sentence to be parenthesised:

My important word is here and there.

The output should be:

My important word is {here} and {there}.

I want the solution to be in python and in particular with spacy. So far I managed to do a program like this:

import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('My important word is here and there.')
my_important_words = [4,6]
for token in doc:
    if token.i in my_important_words:
        print("{"+token.text+"}")
    else:
        print(token.text)

But not only my for loop displays words line by lines but also it sounds pretty verbose program to me. I cannot believe a library like spacy has not a straightforward one/twoliner way to do that.

Any solution?

PS: I know there is displacy fancy solutions for stressing words with some labeled property like this: Spacy Verb highlight?

but it is not really the same because 1) my set of words is a list of words/tokens arbitrary chosen by me 2) I do not want some displacy render html things. I just want plain print on my terminal.

Upvotes: 1

Views: 301

Answers (1)

David Espinosa
David Espinosa

Reputation: 879

A two liner for your use case could be:

import re
import spacy

nlp = spacy.load('en_core_web_lg')
doc = nlp('My important word is here and there.')

my_important_words = [4,6]

# First line: this basically does what you're looking for, but adds an extra space before every punctuation character...
output_string = " ".join([token.text if token.i not in my_important_words else '{'+token.text+'}' for token in doc])

# Second line: solves the 'extra space before punctuation' explained before
output_string = re.sub(' ([@.#$\/:-?!])', r'\1', output_string)

# Results
print(output_string)

The output of the previous code gets what you're looking for in the CLI:

My important word is {here} and {there}.

Hope it helps.

Upvotes: 1

Related Questions