dsw88
dsw88

Reputation: 4592

Java Scanner - Delimit by spaces unless quotation marks are present?

I'm trying to use the Scanner class in Java to get data from a configuration file. The file's elements are delimited by whitespace. However, if a phrase or element should be interpreted as a string literal (including whitespace), then double or single-quotes are places around the element. This gives files that look like this:

> R 120 Something AWord

> P 160 SomethingElse "A string literal"

When using the Java Scanner class, it delimits by just whitespace by default. The Scanner class has the useDelimiter() function that takes a regular expression to specify a different delimiter for the text. I'm not good with regular expressions, however, so I'm not sure how I'd do this.

How can I delimit by whitespace, unless there are quotes surrounding something?

Upvotes: 3

Views: 3974

Answers (1)

DaoWen
DaoWen

Reputation: 33029

You can use the scanner.findInLine(pattern) method to specify that you want to keep string literals from being split. You just need a regular expression that will match a quote-less token or one in quotes. This one might work:

"[^\"\\s]+|\"(\\\\.|[^\\\\\"])*\""

(That regex is extra complicated because it handles escapes inside the string literal.)

Example:

String rx = "[^\"\\s]+|\"(\\\\.|[^\\\\\"])*\"";
Scanner scanner = new Scanner("P 160 SomethingElse \"A string literal\" end");
System.out.println(scanner.findInLine(rx)); // => P
System.out.println(scanner.findInLine(rx)); // => 160
System.out.println(scanner.findInLine(rx)); // => SomethingElse
System.out.println(scanner.findInLine(rx)); // => "A string literal"
System.out.println(scanner.findInLine(rx)); // => end

The findInLine method, as the name suggests, only works within the current line. If you want to search the whole input you can use findWithinHorizon instead. You can pass 0 in as the horizon to tell it to use an unlimited horizon:

scanner.findWithinHorizon(rx, 0);

Upvotes: 5

Related Questions