Reputation: 35
I am trying to extract the person name for Resume. I am not getting the correct output. What i have done till now is.
import en_core_web_sm
import spacy
import pdfplumber
nlp = en_core_web_sm.load()
nlp = spacy.load("en_core_web_sm")
pdf = pdfplumber.open('C:/Person.pdf')
page = pdf.pages[0]
doc = nlp(page.extract_text())
print([(X.text, X.label_) for X in doc.ents if X.label_ == 'PERSON'])
and my output is :
[('Mohamme [email protected]\n', 'PERSON'), ('Mangalore', 'PERSON'), ('Demo Design1', 'PERSON'), ('Demo Design2', 'PERSON'), ('Demo Design3', 'PERSON'), ('Java', 'PERSON')]
I tried many thing but not able to get the only names. it includes many things like skills, email etc.
How can i extract all the details from resume example skills, phone no, name, years of experience, email.
Upvotes: 1
Views: 7348
Reputation: 412
Using spacy to extract the first and last names
We have first defined a pattern that we want to search in our text. Here, we have created a simple pattern based on the fact that the First Name and Last Name of a person is always a Proper Noun. Hence we have specified spacy that searches for a pattern such that two continuous words whose part of speech tag is equal to PROPN (Proper Noun).
import spacy
from spacy.matcher import Matcher
# load pre-trained model
nlp = spacy.load('en_core_web_sm')
# initialize matcher with a vocab
matcher = Matcher(nlp.vocab)
def extract_name(resume_text):
nlp_text = nlp(resume_text)
# First name and Last name are always Proper Nouns
pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}]
matcher.add('NAME', None, [pattern])
matches = matcher(nlp_text)
for match_id, start, end in matches:
span = nlp_text[start:end]
return span.text
Upvotes: 3