Icarus
Icarus

Reputation: 63966

Regular Expression - Is this possible?

Rather than describing what I want (it's difficult to explain), Let me provide an example of what I need to accomplish in C# using a regular expression:

"HelloWorld" should be transformed to "Hello World" 
"HelloWORld" should be transformed to "Hello WO Rld" //Two consecutive letters in capital should be treatead as one word
"helloworld" should be transformed to "helloworld"

EDIT:

"HellOWORLd" should be transformed to "Hell OW OR Ld"

Every 2-consecutive capital letters should be considered one word.

Is this possible?

Upvotes: 1

Views: 320

Answers (6)

sehe
sehe

Reputation: 393154

This is fully working C# code, not just the regex:

Console.WriteLine(
    Regex.Replace(
        "HelloWORld", 
        "(?<!^)(?<wordstart>[A-Z]{1,2})", 
        " ${wordstart}", RegexOptions.Compiled));

And it prints:

Hello WO Rld

Update

To make this more UNICODE/international aware, consider replacing [A-Z] by \p{Lt} (meaning a UNICODE code point that represents a Letter in uppercase). The result for the current input would the same. So here is a slightly more compelling example:

Console.WriteLine(Regex.Replace(
            @"ÉclaireürfØÑJßå",
            @"(?<!^)(?<wordstart>\p{Lu}{1,2})", 
            @" ${wordstart}",
            RegexOptions.Compiled));

Upvotes: 7

Kakashi
Kakashi

Reputation: 2195

I think does not need regular expression in this case. Try this:

  static void Main(string[] args)
        {
            var input = "HellOWORLd";
            var i = 0;
            var x = 4;
            var len = input.Length;
            var output = new List<string>();
            while (x <= len)
            {
                output.Add(SubStr(input, i, x));
                i = x;
                x += 2;

            }
            var ret = output.ToArray(); //["Hell","OW", "OR", "Ld"]

            Console.ReadLine();


        }

static string SubStr(string str, int start, int end)
            {
                var len = str.Length;
                if (start >= 0 && end <= len)
                {
                    var ret = new StringBuilder();
                    for (int i = 0; i < len; i++)
                    {
                        if (i == start)
                        {
                            do
                            {
                                ret.Append(str[i]);
                                i++;
                            } while (i != end);
                        }
                    }
                    return ret.ToString();
                }
                return null;
            }

Upvotes: 0

Mark Cidade
Mark Cidade

Reputation: 99957

string f(string input)
{ 
  //'lowerUPPER' -> 'lower UPPER'
  var x = Regex.Replace(input, "([a-z])([A-Z])","$1 $2"); 

  //'UPPER' -> 'UP PE R'
  return Regex.Replace(x, "([A-Z]{2})","$1 "); 
}

Upvotes: 1

ChaosPandion
ChaosPandion

Reputation: 78282

class Program
{
    static void Main(string[] args)
    {
        Print(Parse("HelloWorld"));
        Print(Parse("HelloWORld"));
        Print(Parse("helloworld"));
        Print(Parse("HellOWORLd"));
        Console.ReadLine();
    }

    static void Print(IEnumerable<string> input)
    {
        foreach (var s in input)
        {
            Console.Write(s);
            Console.Write(' ');
        }
        Console.WriteLine();
    }

    static IEnumerable<string> Parse(string input)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < input.Length; i++)
        {
            if (!char.IsUpper(input[i]))
            {
                sb.Append(input[i]);
                continue;
            }
            if (sb.Length > 0)
            {
                yield return sb.ToString();
                sb.Clear();
            }
            sb.Append(input[i]);
            if (char.IsUpper(input[i + 1]))
            {
                sb.Append(input[++i]);
                yield return sb.ToString();
                sb.Clear();
            }
        }
        if (sb.Length > 0)
        {
            yield return sb.ToString();
        }
    }
}

Upvotes: 0

Peter Short
Peter Short

Reputation: 772

Here are regular expressions that detect what you are looking for:

([A-Z]\w*?)[A-Z]

this matches any uppercase letter from A to Z once followed by aphanumerics up to the next uppercase.

([A-Z]{2}\w*?)[A-Z]

this matches any uppercase letter from A to Z exactly 2 times.

Regex is a matching engine, you can parse the input string and use regex.isMatch to find candidate matches to then insert spaces into the output string

Upvotes: 1

qJake
qJake

Reputation: 17139

The regular expression engine is not a transformative thing by nature, but rather a pattern matching (and replacing) engine. People often mistake the replace part of Regex, thinking that it can do more than it's designed to.

Back to your question, though... Regex cannot do what you want, instead, you should write your own parser to do this. With C#, if you're familiar with the language, this task is somewhat trivial.

It's a case of "You're using the wrong tool for the job".

Upvotes: 2

Related Questions