raghuram gururajan
raghuram gururajan

Reputation: 563

StringTokenizer -How to ignore spaces within a string

I am trying to use a stringtokenizer on a list of words as below

String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc

When i use stringtokenizer and give space as the delimiter as below

StringTokenizer tokens=new StringTokenizer(sentence," ")

I was expecting my output as different tokens as below

Name:jon

location:3333 abc street

country:usa

But the string tokenizer tries to tokenize on the value of location also and it appears like

Name:jon

location:3333

abc

street

country:usa

Please let me know how i can fix the above and if i need to do a regex what kind of the expression should i specify?

Upvotes: 1

Views: 2802

Answers (3)

anubhava
anubhava

Reputation: 785266

This can be easily handled using a CSV Reader.

String str = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";

// prepare String for CSV parsing
CsvReader reader = CsvReader.parse(str.replaceAll("\" *: *\"", ":"));
reader.setDelimiter(' '); // use space a delimiter
reader.readRecord(); // read CSV record
for (int i=0; i<reader.getColumnCount(); i++) // loop thru columns
    System.out.printf("Scol[%d]: [%s]%n", i, reader.get(i));

Update: And here is pure Java SDK solution:

Pattern p = Pattern.compile("(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)");
Matcher m = p.matcher(str);
for (int i=0; m.find(); i++)
    System.out.printf("Scol[%d]: [%s]%n", i, m.group(1).replace("\"", ""));

OUTPUT:

Scol[0]: [Name:jon]
Scol[1]: [location:3333 abc street]
Scol[2]: [country:usa]

Live Demo: http://ideone.com/WO0NK6

Explanation: As per OP's comments:

I am using this regex:

(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)

Breaking it down now into smaller chunks.

PS: DQ represents Double quote

(?:[^\"]*\")                    0 or more non-DQ characters followed by one DQ (RE1)
(?:[^\"]*\"){2}                 Exactly a pair of above RE1
(?:(?:[^\"]*\"){2})*            0 or more occurrences of pair of RE1
(?:(?:[^\"]*\"){2})*[^\"]*$     0 or more occurrences of pair of RE1 followed by 0 or more non-DQ characters followed by end of string (RE2)
(?=(?:(?:[^\"]*\"){2})*[^\"]*$) Positive lookahead of above RE2

.+?  Match 1 or more characters (? is for non-greedy matching)
\\s+ Should be followed by one or more spaces
(\\s+(?=RE2)|$) Should be followed by space or end of string

In short: It means match 1 or more length any characters followed by "a space OR end of string". Space must be followed by EVEN number of DQs. Hence space outside double quotes will be matched and inside double quotes will not be matched (since those are followed by odd number of DQs).

Upvotes: 5

Ted Hopp
Ted Hopp

Reputation: 234807

StringTokenizer is too simple-minded for this job. If you don't need to deal with quote marks inside the values, you can try this regex:

String s = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(s);
while (m.find()) {
    System.out.println(m.group(1));
}

Output:

Name
jon
location
3333 abc street
country
usa

This won't handle internal quote marks within values—where the output should be, e.g.,

Name:Fred ("Freddy") Jones

Upvotes: 2

surender8388
surender8388

Reputation: 474

You can use Json, Its looks like You are using Json kind of schema. Do a bit google and try to implement Json.

String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc

Will be key, value pair in Json like name is key and Jon is value. location is key and 3333 abc street is value. and so on....

Give it a try. Here is one link http://www.mkyong.com/java/json-simple-example-read-and-write-json/

Edit: Its just a bit silly answer, But You can try something like this, sentence = sentence.replaceAll("\" ", ""); StringTokenizer tokens=new StringTokenizer(sentence,"");

Upvotes: 1

Related Questions