Ryan Smith
Ryan Smith

Reputation: 709

How to split a string with a space and leave anything inside a quotation alone?

Given a string entered by a user, I'm trying to split the string by removing any whitespace and getting each token.

But I'm having difficulties for when I have a token in quotation marks. Here are some examples to better clarify:

User input: that is cool

Expected Output:

that
is
cool

User input: The book "Harry Potter" is cool

Expected Output:

The
book
"Harry Potter"
is
cool

User input: Here " is one final " example

Expected Output:

Here
"     is   one   final   "
example

This is what I have so far:

public static void main(String[] args) {
    String input;
    Scanner in = new Scanner(System.in);
    System.out.print("User input: ");
    input = in.nextLine();
    input = input.trim();
    input = input.replaceAll("\\s+", " ");
    String[] a = input.split(" ");

    for (String c: a) {
        System.out.println(c);
    }
}

It only works for the first example but for the examples with quotations, it splits the spaces inside the quoted tokens swell. Example 3 output:

Here
"
is
one
final
"
example

Upvotes: 3

Views: 316

Answers (7)

Bohemian
Bohemian

Reputation: 424993

Here's how to do it in one line:

String[] terms = input.trim().split(" +(?=(([^\"]*\"){2})*[^\"]*$)");

This works by splitting on space(s) only when not within quotes, where "when not within quotes" is defined as "followed by an even number of quotes".

The call to trim() is optional given your examples, but would cater for leading the user entering leading spaces.


Some test code:

String input = "Here  \"     is   one   final   \"    example";
String[] terms = input.trim().split(" +(?=(([^\"]*\"){2})*[^\"]*$)");
Arrays.stream(terms).forEach(System.out::println);

Output:

Here
"     is   one   final   "
example

Upvotes: 0

Madushan Perera
Madushan Perera

Reputation: 2598

What about this :

public static void main(String[] args) {
    StringTokenizer stk;
    //String s="that         is      cool";
    //String s="The       book   "Harry Potter"   is   cool";
    String s = "Here  \"     is   one   final   \"    example";
    Scanner scanner = new Scanner(s);
    scanner.useDelimiter(" +(?=(?:(?:.*?\\\"){2})*[^\\\"]*$)");
    while (scanner.hasNext()) {
        System.out.println(scanner.next());

    }
}

Upvotes: 0

Pshemo
Pshemo

Reputation: 124215

Don't focus on things you want to split on. It is easier to focus on things you want to find as result:

private static final Pattern p = Pattern.compile("\"[^\"]+\"|\\S+");
//                                     quotes---  ^^^^^^^^^^ 
//                                     non+whitespace        ^^^^ 
public static List<String> splitTokensAndQuotes(String text) {
    List<String> result = new ArrayList<>();
    Matcher m = p.matcher(text);
    while (m.find()) {
        result.add(m.group());
    }
    return result;
} 

Demo:

public static void main(String[] args) {

    splitTokensAndQuotes("that         is      cool")
            .forEach(System.out::println);
    System.out.println("------");

    splitTokensAndQuotes("the       book   \"Harry Potter\"   is   cool")
            .forEach(System.out::println);
    System.out.println("------");

    splitTokensAndQuotes("Here  \"     is   one   final   \"    example")
            .forEach(System.out::println);
    System.out.println("------");

}

Result:

that
is
cool
------
the
book
"Harry Potter"
is
cool
------
Here
"     is   one   final   "
example
------

Upvotes: 1

Manos Nikolaidis
Manos Nikolaidis

Reputation: 22224

You can use this pattern

Pattern pattern = Pattern.compile("\"([^\"]+)\"|'([^']+)'|\\S+");

to match words between spaces or between quotes and spaces between them. It will also behave correctly with single quotes. It will keep "it's" as a single word that you may or may not want.

you would then iterate through all the matches like this

Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
    System.out.println(matcher.group());
}

Upvotes: 0

wvdz
wvdz

Reputation: 16641

Only solution I could think of was to write a little parser that simply traverses your input string and keeps a flag that tells you if you have an open quote or not.

public static void main(String[] args)
{
    String input = "Here  \"     is   one   final   \"    example";
    List<String> tokens = new ArrayList<>();
    boolean inQuote = false;

    input = input.trim();
    String token = "";
    for (char c : input.toCharArray())
    {
        if (c == ' ' && !inQuote)
        {
            if (token.length() > 0)
                tokens.add(token);
            token = "";
        }
        else
        {
            token += c;
            if (c == '"')
            {
                inQuote = !inQuote;
                if (!inQuote)
                {
                    tokens.add(token);
                    token = "";
                }
            }
        }
    }
    if (token.length() > 0)
        tokens.add(token);
    System.out.println(tokens);
}

Upvotes: 0

acdcjunior
acdcjunior

Reputation: 135762

Here's something you can try:

public static void main (String[] args) {
    System.out.println(Arrays.toString(splitOnSpacesButNotOnStrings(
         "The       book   \"Harry Potter\"   is   cool"
    )));
    System.out.println(Arrays.toString(splitOnSpacesButNotOnStrings(
         "Here  \"     is   one   final   \"    example"
    )));
    // Output:
    // [The, book, "Harry Potter", is, cool]
    // [Here, "     is   one   final   ", example]
}

private static String[] splitOnSpacesButNotOnStrings(String s) {
    return s.split(" +(?=(?:(?:.*?\"){2})*[^\"]*$)");
}

It will only work, though, if your strings are balanced, that is, contain an even number of "s.

Upvotes: 0

Shiladittya Chakraborty
Shiladittya Chakraborty

Reputation: 4418

Can you try with that :

String str = "Here  \"     is   one   final   \"    example";
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(str);
while (regexMatcher.find()) {
  System.out.println(regexMatcher.group());
} 

Upvotes: 0

Related Questions