How to extract all noun phrases in French Sentences with Spacy(Python)

I am trying to extract all the noun phrases from French sentences using Spacy. My code appears not to be working well in all the cases I tried. For example,

    nlp = spacy.load("fr_core_news_sm")
    doc = nlp("Il y a plusieurs petits restaurants dans cette ville.")
    for chunk in doc.noun_chunks:
      print(chunk)

returns

[Il y a plusieurs petits restaurants dans cette ville.] as the noun phrase, this appears to be incorrect as the noun phrase here is petits restaurants dans cette ville.

When I tried other sets of phrases, such as J'ai trouvé une jolie petite chambre., it returned 3 phrases, [J' , une jolie, petite chambre] which seems not to be correct also

Lastly, with Les deux dernières semaines, il était à Paris.. it returned [Les deux dernières semaines, il] which appears to be correct.

I would appreciate any help or guidance on how to ensure the code works correctly for the first two examples also.

Upvotes: 0

Views: 447

Answers (1)

thorntonc
thorntonc

Reputation: 2126

First try updating your version of SpaCy

pip install spacy --upgrade

Change your model from small fr_core_news_sm to a larger one such as fr_core_news_lg

To install:

-python -m spacy download fr_core_news_lg

or directly pip install from SpaCy's model repository e.g.

pip install https://github.com/explosion/spacy-models/releases/download/fr_core_news_lg-2.3.0/fr_core_news_lg-2.3.0.tar.gz

Larger models typically have better accuracy on most NLP tasks.

Upvotes: 1

Related Questions