Reputation: 4860
I'm trying to standardize the format of some author names in C#. The tricky ones are those who use initials. For example the author of the popular Harry Potter series might be any one of these:
I want to standardize all of these to "JK Rowling".
I'm also trying to solve for names like JRR Tolkien, where there are three initials instead of just two.
After an easy replace of the ".", I'm left with "J K Rowling" or "J R R Tolkien". And I want to convert these into "JK Rowling" and "JRR Tolkien".
So the logic is: Capture a single character followed by any number of white space and then followed by but not including another single character. Remove the white space from the capture and replace the capture with the cleaned up string.
Here are some samples:
I've gotten to this point where I'm able to capture the characters I need:
(\b[a-zA-Z]\b\s*)*
https://www.debuggex.com/r/OLnu3YvvjIumGbQ1
But I'm not sure where to go from here in order to replace the capture with a version that doesn't have any white space.
Upvotes: 1
Views: 96
Reputation: 5036
Do you need to use regular expressions? You could just split a name and then insert spaces according to your ideas of right (might be easier to change if you find a new pattern). Something like this:
string FixName(string name)
{
StringBuilder sb=new StringBuilder();
var ar=name.Replace('.',' ').Split(' ');
for (int i = 0; i < ar.Length; i++)
{
sb.Append(ar[i]);
if (i < ar.Length - 1 && ar[i+1].Length>1)
sb.Append(" ");
}
return sb.ToString();
}
Upvotes: 1
Reputation: 174696
Use this regex and then replace the match with an empty string.
@"(?<=\b[A-Z])[.\s]+(?=[A-Z]\b)|(?<=\b[A-Z])\.(?=\s[A-Z])"
Regex.Replace(yourString, @"(?<=\b[A-Z])[.\s]+(?=[A-Z]\b)|(?<=\b[A-Z])\.(?=\s[A-Z])", "");
Upvotes: 2