user2773755
user2773755

Reputation: 117

Parse full name for UK users in c++ considering multiple scenarios

I am working with a third party API, which receives the forename, middlename, and surname as parameters. What I have now is the fullName of the user. I cannot change this structure because is part of a large scale system that cannot be changed.

The structure of the parameters of the third party API cannot change, so despite is not the best solution I have to parse the fullname and split it into three fields.

The names are only from UK, which is good. This is what I am doing now:

vector<string> fields;
myLibrary::split(' ', name, fields);
string firstName = fields[0];
string middleName = fields.size() == 3 ? fields[1] : "";
string surName = fields.size() == 3 ? fields[2] : fields[1];

But I am pretty sure that is a bad solution, since there are multiple scenarios, like for example when a name contains the text Mr. at the beginning, or Jr. at the end.

Is there any code block that can give me better guarantees than this code?

Thanks!

[UPDATE]

The name that is requested in the form to the user is "Account Holder's Name", and this information is related with the bank account information, so the user won't put his nickname. I added the example of the Jr. or Sr. as one possible Scenario I found, there might be other known problems with names. Some insights about the properties of names in UK would be useful.

Upvotes: 0

Views: 154

Answers (2)

DFlorin
DFlorin

Reputation: 1

I guess you can use the substr() property for strings. Make a comparison for your surName or firstName variable and restrict it from being assigned with "Mr." or "Jr.". Hence, one suggestion could be:

if(surname.substr(surname.length()-3)=="Jr.")
   surname="";

Upvotes: 0

StilesCrisis
StilesCrisis

Reputation: 16300

If you have freeform input in the full name, including even things which aren't the name (like "Mr." which is technically a title separate from one's name), there's simply no foolproof way to convert that into a first/middle/last split.

You can come up with a system of heuristics which catches the common cases, and keep running your data set through until it passes. But you won't find a silver bullet that makes the problem go away.

My advice would be to make your heuristics, get them working 100% as best you can, and then convert the data in the large scale system to a split format. Whoever designed the name as one monolithic field simply made a mistake. If that's not possible, just let your employers know that it will continue to cause problems down the road and that the design error was not yours.

Upvotes: 1

Related Questions