MindGame
MindGame

Reputation: 1251

Format String : Parsing

I have a parsing question. I have a paragraph which has instances of :  word  . So basically it has a colon, two spaces, a word (could be anything), then two more spaces.

So when I have those instances I want to convert the string so I have

  1. A new line character after : and the word.
  2. Removed the double space after the word.
  3. Replace all double spaces with new line characters.

Don't know exactly how about to do this. I'm using C# to do this. Bullet point 2 above is what I'm having a hard time doing this.

Thanks

Upvotes: 1

Views: 403

Answers (5)

JYelton
JYelton

Reputation: 36522

The following is an example using Regular Expressions. See also this question for more info.

Basically the pattern string tells the regex to look for a colon followed by two spaces. Then we save in a capture group named "word" whatever the word is surrounded by two spaces on either side. Finally two more spaces are specified to finish the pattern.

The replace uses a lambda which says for every match, replace it with a colon, a new line, the "lone" word, and another newline.

string Paragraph = "Jackdaws love my big sphinx of quartz:  fizz  The quick onyx goblin jumps over the lazy dwarf. Where:  buzz  The crazy dogs.";
string Pattern = @":  (?<word>\S*)  ";
string Result = Regex.Replace(Paragraph, Pattern, m =>
    {
        var LoneWord = m.Groups[1].Value;
         return @":" + Environment.NewLine + LoneWord + Environment.NewLine;
    },
    RegexOptions.IgnoreCase);

Input

Jackdaws love my big sphinx of quartz:  fizz  The quick onyx goblin jumps over the lazy dwarf. Where:  buzz  The crazy dogs.

Output

Jackdaws love my big sphinx of quartz:
fizz
The quick onyx goblin jumps over the lazy dwarf. Where:
buzz
The quick brown fox.

Note, for item 3 on your list, if you also want to replace individual occurrences of two spaces with newlines, you could do this:

Result = Result.Replace("  ", Environment.NewLine);

Upvotes: 1

Bala R
Bala R

Reputation: 108977

You can try

var str = ":  first  :  second  ";
var result = Regex.Replace(str, ":\\s{2}(?<word>[a-zA-Z0-9]+)\\s{2}",
                                                         ":\n${word}\n");

Upvotes: 2

Zhais
Zhais

Reputation: 1541

Using RegularExpressions will give you exact matches on what you are looking for.

The regex match for a colon, two spaces, a word, then two more spaces is:

Dim reg as New Regex(":    [a-zA-Z]*    ")

[a-zA-Z] will look for any character within the alphabetical range. Can append 0-9 on as well if you accept numbers within the word. The * afterwards indicated that there can be 0 or more instances of the preceding value.

[a-zA-Z]* will attempt to do a full match of any set of contiguous alpha characters.

Upon further reading, you may use [\w] in place of [a-zA-Z0-9] if that's what you are looking for. This will match any 'word' character.

source: http://msdn.microsoft.com/en-us/library/ms972966.aspx

You can retrieve all the matches using reg.Matches(inputString).

Review http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx for more information on regular expression replacements and your options from there out

edit: Before I was using \s to search for spaces. This will match any whitespace character including tabs, new lines and other. That is not what we want, so I reverted it back to search for exact space characters.

Upvotes: 2

Oded
Oded

Reputation: 499132

Assuming your original string is exactly in the form you described, this will do:

var newString = myString.Trim().Replace("  ", "\n");

The Trim() removes leading and trailing whitespaces, taking care of your spaces at the end of the string.

Then, the Replace replaces the remaining " " two space characters, with a "\n" new line character.

The result is assigned to the newString variable. This is needed, as myString will not change - as strings in .NET are immutable.

I suggest you read up on the String class and all its methods and properties.

Upvotes: 3

Piotr Perak
Piotr Perak

Reputation: 11088

You can use string.TrimEnd - http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx - to trim spaces at the end of the string.

Upvotes: 1

Related Questions