Reputation: 709
Given a string entered by a user, I'm trying to split the string by removing any whitespace and getting each token.
But I'm having difficulties for when I have a token in quotation marks. Here are some examples to better clarify:
User input: that is cool
Expected Output:
that
is
cool
User input: The book "Harry Potter" is cool
Expected Output:
The
book
"Harry Potter"
is
cool
User input: Here " is one final " example
Expected Output:
Here
" is one final "
example
This is what I have so far:
public static void main(String[] args) {
String input;
Scanner in = new Scanner(System.in);
System.out.print("User input: ");
input = in.nextLine();
input = input.trim();
input = input.replaceAll("\\s+", " ");
String[] a = input.split(" ");
for (String c: a) {
System.out.println(c);
}
}
It only works for the first example but for the examples with quotations, it splits the spaces inside the quoted tokens swell. Example 3 output:
Here
"
is
one
final
"
example
Upvotes: 3
Views: 316
Reputation: 424993
Here's how to do it in one line:
String[] terms = input.trim().split(" +(?=(([^\"]*\"){2})*[^\"]*$)");
This works by splitting on space(s) only when not within quotes, where "when not within quotes" is defined as "followed by an even number of quotes".
The call to trim()
is optional given your examples, but would cater for leading the user entering leading spaces.
Some test code:
String input = "Here \" is one final \" example";
String[] terms = input.trim().split(" +(?=(([^\"]*\"){2})*[^\"]*$)");
Arrays.stream(terms).forEach(System.out::println);
Output:
Here
" is one final "
example
Upvotes: 0
Reputation: 2598
What about this :
public static void main(String[] args) {
StringTokenizer stk;
//String s="that is cool";
//String s="The book "Harry Potter" is cool";
String s = "Here \" is one final \" example";
Scanner scanner = new Scanner(s);
scanner.useDelimiter(" +(?=(?:(?:.*?\\\"){2})*[^\\\"]*$)");
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
}
Upvotes: 0
Reputation: 124215
Don't focus on things you want to split
on. It is easier to focus on things you want to find
as result:
private static final Pattern p = Pattern.compile("\"[^\"]+\"|\\S+");
// quotes--- ^^^^^^^^^^
// non+whitespace ^^^^
public static List<String> splitTokensAndQuotes(String text) {
List<String> result = new ArrayList<>();
Matcher m = p.matcher(text);
while (m.find()) {
result.add(m.group());
}
return result;
}
Demo:
public static void main(String[] args) {
splitTokensAndQuotes("that is cool")
.forEach(System.out::println);
System.out.println("------");
splitTokensAndQuotes("the book \"Harry Potter\" is cool")
.forEach(System.out::println);
System.out.println("------");
splitTokensAndQuotes("Here \" is one final \" example")
.forEach(System.out::println);
System.out.println("------");
}
Result:
that
is
cool
------
the
book
"Harry Potter"
is
cool
------
Here
" is one final "
example
------
Upvotes: 1
Reputation: 22224
You can use this pattern
Pattern pattern = Pattern.compile("\"([^\"]+)\"|'([^']+)'|\\S+");
to match words between spaces or between quotes and spaces between them. It will also behave correctly with single quotes. It will keep "it's"
as a single word that you may or may not want.
you would then iterate through all the matches like this
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
Upvotes: 0
Reputation: 16641
Only solution I could think of was to write a little parser that simply traverses your input string and keeps a flag that tells you if you have an open quote or not.
public static void main(String[] args)
{
String input = "Here \" is one final \" example";
List<String> tokens = new ArrayList<>();
boolean inQuote = false;
input = input.trim();
String token = "";
for (char c : input.toCharArray())
{
if (c == ' ' && !inQuote)
{
if (token.length() > 0)
tokens.add(token);
token = "";
}
else
{
token += c;
if (c == '"')
{
inQuote = !inQuote;
if (!inQuote)
{
tokens.add(token);
token = "";
}
}
}
}
if (token.length() > 0)
tokens.add(token);
System.out.println(tokens);
}
Upvotes: 0
Reputation: 135762
Here's something you can try:
public static void main (String[] args) {
System.out.println(Arrays.toString(splitOnSpacesButNotOnStrings(
"The book \"Harry Potter\" is cool"
)));
System.out.println(Arrays.toString(splitOnSpacesButNotOnStrings(
"Here \" is one final \" example"
)));
// Output:
// [The, book, "Harry Potter", is, cool]
// [Here, " is one final ", example]
}
private static String[] splitOnSpacesButNotOnStrings(String s) {
return s.split(" +(?=(?:(?:.*?\"){2})*[^\"]*$)");
}
It will only work, though, if your strings are balanced, that is, contain an even number of "
s.
Upvotes: 0
Reputation: 4418
Can you try with that :
String str = "Here \" is one final \" example";
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(str);
while (regexMatcher.find()) {
System.out.println(regexMatcher.group());
}
Upvotes: 0