Trevor
Trevor

Reputation: 4860

How do I replace a set of single characters and white space with the same characters but no white space

I'm trying to standardize the format of some author names in C#. The tricky ones are those who use initials. For example the author of the popular Harry Potter series might be any one of these:

I want to standardize all of these to "JK Rowling".

I'm also trying to solve for names like JRR Tolkien, where there are three initials instead of just two.

After an easy replace of the ".", I'm left with "J K Rowling" or "J R R Tolkien". And I want to convert these into "JK Rowling" and "JRR Tolkien".

So the logic is: Capture a single character followed by any number of white space and then followed by but not including another single character. Remove the white space from the capture and replace the capture with the cleaned up string.

Here are some samples:

I've gotten to this point where I'm able to capture the characters I need:

(\b[a-zA-Z]\b\s*)*

https://www.debuggex.com/r/OLnu3YvvjIumGbQ1

But I'm not sure where to go from here in order to replace the capture with a version that doesn't have any white space.

Upvotes: 1

Views: 96

Answers (3)

Nadia Chibrikova
Nadia Chibrikova

Reputation: 5036

Do you need to use regular expressions? You could just split a name and then insert spaces according to your ideas of right (might be easier to change if you find a new pattern). Something like this:

string FixName(string name)
    {
        StringBuilder sb=new StringBuilder();
        var ar=name.Replace('.',' ').Split(' ');
        for (int i = 0; i < ar.Length; i++)
        {
            sb.Append(ar[i]);
            if (i < ar.Length - 1 && ar[i+1].Length>1)
                sb.Append(" ");
        }
        return sb.ToString();
    }

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

Use this regex and then replace the match with an empty string.

@"(?<=\b[A-Z])[.\s]+(?=[A-Z]\b)|(?<=\b[A-Z])\.(?=\s[A-Z])"

DEMO

Regex.Replace(yourString, @"(?<=\b[A-Z])[.\s]+(?=[A-Z]\b)|(?<=\b[A-Z])\.(?=\s[A-Z])", "");

Upvotes: 2

Aran-Fey
Aran-Fey

Reputation: 43126

Try to replace

\b(\w)\.?\s*(?!\w\w)

with $1.

regex101 demo.

Upvotes: 3

Related Questions