Teachme
Teachme

Reputation: 715

Split a string that has white spaces, unless they are enclosed within "quotes"?

To make things simple:

string streamR = sr.ReadLine();  // sr.Readline results in:
                                 //                         one "two two"

I want to be able to save them as two different strings, remove all spaces EXCEPT for the spaces found between quotation marks. Therefore, what I need is:

string 1 = one
string 2 = two two

So far what I have found that works is the following code, but it removes the spaces within the quotes.

//streamR.ReadLine only has two strings
  string[] splitter = streamR.Split(' ');
    str1 = splitter[0];
    // Only set str2 if the length is >1
    str2 = splitter.Length > 1 ? splitter[1] : string.Empty;

The output of this becomes

one
two

I have looked into Regular Expression to split on spaces unless in quotes however I can't seem to get regex to work/understand the code, especially how to split them so they are two different strings. All the codes there give me a compiling error (I am using System.Text.RegularExpressions)

Upvotes: 68

Views: 54196

Answers (8)

Riccardo Volpe
Riccardo Volpe

Reputation: 1623

I used these patterns:

Without including quotes (single and double) and without positive lookbehind:

pattern = "/[^''\""]+(?=[''\""][ ]|[''\""]$)|[^''\"" ]+(?=[ ]|$)/gm"

Without including quotes (single and double) and with positive lookbehind:

pattern = "/(?<=[ ][''\""]|^[''\""])[^''\""]+(?=[''\""][ ]|[''\""]$)|(?<=[ ]|^)[^''\"" ]+(?=[ ]|$)/gm"

Including quotes (single and double) and without positive lookbehind:

pattern = "/[''].+?['']|[\""].+?[\""]|[^ ]+/gm"

Tested here:

  1. regex101
  2. regexr

Upvotes: 0

Kux
Kux

Reputation: 1489

With support for double quotes.

String:

a "b b" "c ""c"" c"

Result:

a 
"b b"
"c ""c"" c"

Code:

var list=Regex.Matches(value, @"\""(\""\""|[^\""])+\""|[^ ]+", 
    RegexOptions.ExplicitCapture)
            .Cast<Match>()
            .Select(m => m.Value)
            .ToList();

Optional remove double quotes:

Select(m => m.StartsWith("\"") ? m.Substring(1, m.Length - 2).Replace("\"\"", "\"") : m)

Result

a 
b b
c "c" c

Upvotes: 5

psubsee2003
psubsee2003

Reputation: 8741

As custom parser might be more suitable for this.

This is something I wrote once when I had a specific (and very strange) parsing requirement that involved parenthesis and spaces, but it is generic enough that it should work with virtually any delimiter and text qualifier.

public static IEnumerable<String> ParseText(String line, Char delimiter, Char textQualifier)
{

    if (line == null)
        yield break;

    else
    {
        Char prevChar = '\0';
        Char nextChar = '\0';
        Char currentChar = '\0';

        Boolean inString = false;

        StringBuilder token = new StringBuilder();

        for (int i = 0; i < line.Length; i++)
        {
            currentChar = line[i];

            if (i > 0)
                prevChar = line[i - 1];
            else
                prevChar = '\0';

            if (i + 1 < line.Length)
                nextChar = line[i + 1];
            else
                nextChar = '\0';

            if (currentChar == textQualifier && (prevChar == '\0' || prevChar == delimiter) && !inString)
            {
                inString = true;
                continue;
            }

            if (currentChar == textQualifier && (nextChar == '\0' || nextChar == delimiter) && inString)
            {
                inString = false;
                continue;
            }

            if (currentChar == delimiter && !inString)
            {
                yield return token.ToString();
                token = token.Remove(0, token.Length);
                continue;
            }

            token = token.Append(currentChar);

        }

        yield return token.ToString();

    } 
}

The usage would be:

var parsedText = ParseText(streamR, ' ', '"');

Upvotes: 19

I4V
I4V

Reputation: 35353

string input = "one \"two two\" three \"four four\" five six";
var parts = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToList();

Upvotes: 63

user3566056
user3566056

Reputation: 244

There's just a tiny problem with Squazz' answer.. it works for his string, but not if you add more items. E.g.

string myString = "WordOne \"Word Two\" Three"

In that case, the removal of the last quotation mark would get us 4 results, not three.

That's easily fixed though.. just count the number of escape characters, and if it's uneven, strip the last (adapt as per your requirements..)

    public static List<String> Split(this string myString, char separator, char escapeCharacter)
    {
        int nbEscapeCharactoers = myString.Count(c => c == escapeCharacter);
        if (nbEscapeCharactoers % 2 != 0) // uneven number of escape characters
        {
            int lastIndex = myString.LastIndexOf("" + escapeCharacter, StringComparison.Ordinal);
            myString = myString.Remove(lastIndex, 1); // remove the last escape character
        }
        var result = myString.Split(escapeCharacter)
                             .Select((element, index) => index % 2 == 0  // If even index
                                                   ? element.Split(new[] { separator }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                                   : new string[] { element })  // Keep the entire item
                             .SelectMany(element => element).ToList();
        return result;
    }

I also turned it into an extension method and made separator and escape character configurable.

Upvotes: 1

John Koerner
John Koerner

Reputation: 38077

You can use the TextFieldParser class that is part of the Microsoft.VisualBasic.FileIO namespace. (You'll need to add a reference to Microsoft.VisualBasic to your project.):

string inputString = "This is \"a test\" of the parser.";

using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(inputString)))
{
    using (Microsoft.VisualBasic.FileIO.TextFieldParser tfp = new TextFieldParser(ms))
    {
        tfp.Delimiters = new string[] { " " };
        tfp.HasFieldsEnclosedInQuotes = true;
        string[] output = tfp.ReadFields();

        for (int i = 0; i < output.Length; i++)
        {
            Console.WriteLine("{0}:{1}", i, output[i]);
        }
    }
}

Which generates the output:

0:This
1:is
2:a test
3:of
4:the
5:parser.

Upvotes: 15

Squazz
Squazz

Reputation: 4171

OP wanted to

... remove all spaces EXCEPT for the spaces found between quotation marks

The solution from Cédric Bignon almost did this, but didn't take into account that there could be an uneven number of quotation marks. Starting out by checking for this, and then removing the excess ones, ensures that we only stop splitting if the element really is encapsulated by quotation marks.

string myString = "WordOne \"Word Two";
int placement = myString.LastIndexOf("\"", StringComparison.Ordinal);
if (placement >= 0)
myString = myString.Remove(placement, 1);

var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();

Credit for the logic goes to Cédric Bignon, I only added a safeguard.

Upvotes: 0

C&#233;dric Bignon
C&#233;dric Bignon

Reputation: 13022

You can even do that without Regex: a LINQ expression with String.Split can do the job.

You can split your string before by " then split only the elements with even index in the resulting array by .

var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

For the string:

This is a test for "Splitting a string" that has white spaces, unless they are "enclosed within quotes"

It gives the result:

This
is
a
test
for
Splitting a string
that
has
white
spaces,
unless
they
are
enclosed within quotes

UPDATE

string myString = "WordOne \"Word Two\"";
var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();

UPDATE 2

How do you define a quoted portion of the string?

We will assume that the string before the first " is non-quoted.

Then, the string placed between the first " and before the second " is quoted. The string between the second " and the third " is non-quoted. The string between the third and the fourth is quoted, ...

The general rule is: Each string between the (2*n-1)th (odd number) " and (2*n)th (even number) " is quoted. (1)

What is the relation with String.Split?

String.Split with the default StringSplitOption (define as StringSplitOption.None) creates an list of 1 string and then add a new string in the list for each splitting character found.

So, before the first ", the string is at index 0 in the splitted array, between the first and second ", the string is at index 1 in the array, between the third and fourth, index 2, ...

The general rule is: The string between the nth and (n+1)th " is at index n in the array. (2)

The given (1) and (2), we can conclude that: Quoted portion are at odd index in the splitted array.

Upvotes: 47

Related Questions