Reputation: 4608
I want to create a model that detects the gender based on a full name. I have two dictionaries with male & female names. I want to develop a model to classify previously unseen names.
I need to determine the gender after the NER (name entity recognition) process. This delivers a PERSON entity with any one of these characteristics:
I can do male vs female determination on (given) name only. The model needs to handle SURNAME only, classifying it as NO_GENDER.
I know that surnames can be noisy, but I must deal with them, because they could be a part of the input.
Upvotes: 0
Views: 1277
Reputation: 77880
First, pre-process the data: in a full-name input, keep only the name (see below). Apply this to unknown input as well.
I suggest that you train a multi-class SVM. You already know the three classes. Make up the following training (labeled) data:
Essentially,you train this to recognize FEMALE, MALE, and everything else.
PREPROCESS
This will give you some troubles, due to varying name formats. You may have trouble with compound names, such as
Bobby Jo male name with female modifier
van der Waal compound surname with male-looking prefix
St. John surname with gendered primary
Haley-Christopher hyphenated surname, genedered
If you pre-process the inputs, you may have some trouble spotting the proper division in, say, Billy Jean St. John
or Marie-Therese von Klaus
.
Upvotes: 1