Reputation: 55
I'm using Scanner and a Delimiter to tokenize my .txt file (it's a homework that I've got to do). First version of the file looks like this:
5,5,5,6,5,8,9,5,6,8, good, very good, excellent, good
7,7,8,7,6,7,8,8,9,7,very good, Good, excellent, very good
8,7,6,7,8,7,5,6,8,7 ,GOOD, VERY GOOD, GOOD, AVERAGE
9,9,9,8,9,7,9,8,9,9 ,Excellent, very good, very good, excellent
7,8,8,7,8,7,8,9,6,8 ,very good, good, excellent, excellent
6,5,6,4,5,6,5,6,6,6 ,good, average, good, good
7,8,7,7,6,8,7,8,6,6 ,good, very good, good, very good
5,7,6,7,6,7,6,7,7,7 ,excellent, very good, very good, very good
And I've used useDelimiter("[ ]*(,)[ ]*")
second version of the file looks like this:
5 5 5 6 5 8 9 5 6 8 good, very good, excellent, good
7 7 8 7 6 7 8 8 9 7 very good, Good, excellent, very good
8 7 6 7 8 7 5 6 8 7 GOOD, VERY GOOD, GOOD, AVERAGE
9 9 9 8 9 7 9 8 9 9 Excellent, very good, very good, excellent
7 8 8 7 8 7 8 9 6 8 very good, good, excellent, excellent
6 5 6 4 5 6 5 6 6 6 good, average, good, good
7 8 7 7 6 8 7 8 6 6 good, very good, good, very good
5 7 6 7 6 7 6 7 7 7 excellent, very good, very good, very good
And I can't come up with a regexp which would help me to separate numbers by space and words by comma. Esentially I need an array with 14 values (very good being a single variable)
Note there are multiple spaces (this is done on purpose to make it harder for us).
So any sort of help would be appreciated.
P.S. We're only allowed to use Delimiters only (no splits etc..)
Upvotes: 5
Views: 185
Reputation: 6887
Note that Scanner
allows you to change the delimiter at any time. If you can rely on your input text always having 10 numbers at the beginning and 4 word groups at the end, then you can simply start with a delimiter that just splits on spaces (\s+
) and after 10 calls to nextInt()
, switch to a delimiter that splits on a
comma and spaces (\s*,\s*
).
Something like:
String input = "5 5 5 6 5 8 9 5 6 8 good, very good, excellent, good";
Scanner scanner = new Scanner(input).useDelimiter("\\s+");
int[] results = new int[14];
for (int i = 0; i < 10; ++i) {
results[i] = scanner.nextInt();
}
scanner.useDelimiter("\\s*,\\s*");
scanner.skip("\\s*");
for (int i = 10; i < 14; ++i) {
String wordPhrase = scanner.next();
int wordValue;
if ("average".equalsIgnoreCase(wordPhrase))
wordValue = 1;
else if ("good".equalsIgnoreCase(wordPhrase))
wordValue = 2;
else if ("very good".equalsIgnoreCase(wordPhrase))
wordValue = 3;
else if ("excellent".equalsIgnoreCase(wordPhrase))
wordValue = 4;
else
wordValue = 0;
results[i] = wordValue;
}
It's also possible to do this with a single delimiter regex using zero-width lookaround assertions, but this is probably a bit advanced for a simple homework problem.
Upvotes: 2
Reputation: 6234
This should work, the key is the positive-lookbehind ((<?=)
) and alternation (|
):
String input = "9 9 9 8 9 7 9 8 9 9 Excellent, very good, very good, excellent";
Scanner s = new Scanner(input).useDelimiter("(?<=\\d)[\\s,]+|\\s*,\\s*");
while (s.hasNext()) {
System.out.println("Token: ." + s.next() + ".");
}
Prints:
Token: .9.
Token: .9.
Token: .9.
Token: .8.
Token: .9.
Token: .7.
Token: .9.
Token: .8.
Token: .9.
Token: .9.
Token: .Excellent.
Token: .very good.
Token: .very good.
Token: .excellent.
Upvotes: 4
Reputation: 1419
You can try this one (((?<=[0-9]+)\s*(?=[0-9]+))|(,\s*(?=[a-zA-Z]+))|((?<=[0-9]+)\s*(?=[a-zA-Z]+)))
, looks awful but should work
Upvotes: 2
Reputation: 12843
String[] str = expression.split("(,\\s+)|(\\s+)");
Try this:
Upvotes: 0