sripandianman
sripandianman

Reputation: 35

How to extract names from the resume in python

I am trying to extract the person name for Resume. I am not getting the correct output. What i have done till now is.

import en_core_web_sm
import spacy
import pdfplumber
nlp = en_core_web_sm.load()

nlp = spacy.load("en_core_web_sm")
pdf = pdfplumber.open('C:/Person.pdf')
page = pdf.pages[0]
doc = nlp(page.extract_text())
print([(X.text, X.label_) for X in doc.ents if X.label_ == 'PERSON'])

and my output is :

[('Mohamme [email protected]\n', 'PERSON'), ('Mangalore', 'PERSON'), ('Demo Design1', 'PERSON'), ('Demo Design2', 'PERSON'), ('Demo Design3', 'PERSON'), ('Java', 'PERSON')]

I tried many thing but not able to get the only names. it includes many things like skills, email etc.

How can i extract all the details from resume example skills, phone no, name, years of experience, email.

Upvotes: 1

Views: 7348

Answers (1)

datamansahil
datamansahil

Reputation: 412

Using spacy to extract the first and last names

We have first defined a pattern that we want to search in our text. Here, we have created a simple pattern based on the fact that the First Name and Last Name of a person is always a Proper Noun. Hence we have specified spacy that searches for a pattern such that two continuous words whose part of speech tag is equal to PROPN (Proper Noun).

import spacy
from spacy.matcher import Matcher

# load pre-trained model
nlp = spacy.load('en_core_web_sm')

# initialize matcher with a vocab
matcher = Matcher(nlp.vocab)

def extract_name(resume_text):
    nlp_text = nlp(resume_text)
    
    # First name and Last name are always Proper Nouns
    pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}]
    
    matcher.add('NAME', None, [pattern])
    
    matches = matcher(nlp_text)
    
    for match_id, start, end in matches:
        span = nlp_text[start:end]
        return span.text

Upvotes: 3

Related Questions