Reputation: 563
I am trying to use a stringtokenizer on a list of words as below
String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc
When i use stringtokenizer and give space as the delimiter as below
StringTokenizer tokens=new StringTokenizer(sentence," ")
I was expecting my output as different tokens as below
Name:jon
location:3333 abc street
country:usa
But the string tokenizer tries to tokenize on the value of location also and it appears like
Name:jon
location:3333
abc
street
country:usa
Please let me know how i can fix the above and if i need to do a regex what kind of the expression should i specify?
Upvotes: 1
Views: 2802
Reputation: 785266
This can be easily handled using a CSV Reader.
String str = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";
// prepare String for CSV parsing
CsvReader reader = CsvReader.parse(str.replaceAll("\" *: *\"", ":"));
reader.setDelimiter(' '); // use space a delimiter
reader.readRecord(); // read CSV record
for (int i=0; i<reader.getColumnCount(); i++) // loop thru columns
System.out.printf("Scol[%d]: [%s]%n", i, reader.get(i));
Pattern p = Pattern.compile("(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)");
Matcher m = p.matcher(str);
for (int i=0; m.find(); i++)
System.out.printf("Scol[%d]: [%s]%n", i, m.group(1).replace("\"", ""));
OUTPUT:
Scol[0]: [Name:jon]
Scol[1]: [location:3333 abc street]
Scol[2]: [country:usa]
I am using this regex:
(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)
Breaking it down now into smaller chunks.
PS: DQ represents Double quote
(?:[^\"]*\") 0 or more non-DQ characters followed by one DQ (RE1)
(?:[^\"]*\"){2} Exactly a pair of above RE1
(?:(?:[^\"]*\"){2})* 0 or more occurrences of pair of RE1
(?:(?:[^\"]*\"){2})*[^\"]*$ 0 or more occurrences of pair of RE1 followed by 0 or more non-DQ characters followed by end of string (RE2)
(?=(?:(?:[^\"]*\"){2})*[^\"]*$) Positive lookahead of above RE2
.+? Match 1 or more characters (? is for non-greedy matching)
\\s+ Should be followed by one or more spaces
(\\s+(?=RE2)|$) Should be followed by space or end of string
In short: It means match 1 or more length any characters followed by "a space OR end of string". Space must be followed by EVEN number of DQs. Hence space outside double quotes will be matched and inside double quotes will not be matched (since those are followed by odd number of DQs).
Upvotes: 5
Reputation: 234807
StringTokenizer is too simple-minded for this job. If you don't need to deal with quote marks inside the values, you can try this regex:
String s = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
Output:
Name
jon
location
3333 abc street
country
usa
This won't handle internal quote marks within values—where the output should be, e.g.,
Name:Fred ("Freddy") Jones
Upvotes: 2
Reputation: 474
You can use Json, Its looks like You are using Json kind of schema. Do a bit google and try to implement Json.
String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc
Will be key, value pair in Json like name is key and Jon is value. location is key and 3333 abc street is value. and so on....
Give it a try. Here is one link http://www.mkyong.com/java/json-simple-example-read-and-write-json/
Edit: Its just a bit silly answer, But You can try something like this, sentence = sentence.replaceAll("\" ", ""); StringTokenizer tokens=new StringTokenizer(sentence,"");
Upvotes: 1