Faseela Thayattuchira
Faseela Thayattuchira

Reputation: 527

ScispaCy in google colab

I am trying to build NER model of clinical data using ScispaCy in colab. I have installed packages like this.

!pip install spacy
!pip install scispacy
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz       #pip install <Model URL>```

Then I imported both using

import scispacy
import spacy
import en_core_sci_md

then used following code to display sentences and entities

nlp = spacy.load("en_core_sci_md")
text ="""Myeloid derived suppressor cells (MDSC) are immature myeloid cells with immunosuppressive activity. They accumulate in tumor-bearing mice and humans with different types of cancer, including hepatocellular carcinoma (HCC)""" 
doc = nlp(text)
print(list(doc.sents))
print(doc.ents)

I am getting the following error

OSError: [E050] Can't find model 'en_core_sci_md'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

I don't know why this error is coming, I followed all codes from the official GitHub post of ScispaCy. Any help would be appreciated. Thanks in advance.

Upvotes: 3

Views: 2628

Answers (1)

Ledian K.
Ledian K.

Reputation: 595

I hope I am not too late... I believe you are very close to the correct approach.

I will write my answer in steps and you can choose where to stop.

Step 1)

#Install en_core_sci_lg package from the website of spacy  (large corpus), but you can also use en_core_sci_md for the medium corpus.
       
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_lg-0.2.4.tar.gz 

Step 2)

# Import the large dataset
import en_core_sci_lg

Step 3)

# Identify entities
nlp = en_core_sci_lg.load()
doc = nlp(text)
displacy_image = displacy.render(doc, jupyter = True, style = "ent")

Step 4)

#Print only the entities
print(doc.ents)

Step 5)

# Save the result 
save_res = [doc.ents]
save_res

Step 6)

#Save the results to a dataframe
df_save_res = pd.DataFrame(save_res)
df_save_res

Step 7)

# In case that you want to visualise the dependency parse
  displacy_image = displacy.render(doc, jupyter = True, style = "dep")

Upvotes: 3

Related Questions