user10062624
user10062624

Reputation:

How can I make compounds words singular using an NLP library?

Issue

I'm trying to make compounds words singular from plural using spaCy.

However, I cannot fix an error to transform plural to singular as compounds words.

How can I get the preferred output like the below?

cute dog
two or three word
the christmas day

Develop Environment

Python 3.9.1

Error

    print(str(nlp(word).lemma_))
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'lemma_'

Code

import spacy
nlp = spacy.load("en_core_web_sm")

words = ["cute dogs", "two or three words", "the christmas days"]

for word in words:
    print(str(nlp(word).lemma_))

Trial

cute
dog
two
or
three
word
the
christmas
day
import spacy
nlp = spacy.load("en_core_web_sm")

words = ["cute dogs", "two or three words", "the christmas days"]

for word in words:
    word = nlp(word)
    for token in word:
        print(str(token.lemma_))

Upvotes: 1

Views: 299

Answers (1)

polm23
polm23

Reputation: 15633

As you've found out, you can't get the lemma of a doc, only of individual words. Multi-word expressions don't have lemmas in English, lemmas are only for individual words. However, conveniently, in English compound words are pluralized just by pluralizing the last word, so you can just make the last word singular. Here's an example:

import spacy

nlp = spacy.load("en_core_web_sm")


def make_compound_singular(text):
    doc = nlp(text)

    if len(doc) == 1:
        return doc[0].lemma_
    else:
        return doc[:-1].text + doc[-2].whitespace_ + doc[-1].lemma_

texts = ["cute dogs", "two or three words", "the christmas days"]
for text in texts:
    print(make_compound_singular(text))

Upvotes: 0

Related Questions