ASD
ASD

Reputation: 1441

string tokenizer in Java

I have a text file which contains data seperated by '|'. I need to get each field(seperated by '|') and process it. The text file can be shown as below :

ABC|DEF||FGHT

I am using string tokenizer(JDK 1.4) for getting each field value. Now the problem is, I should get an empty string after DEF.However, I am not getting the empty space between DEF & FGHT.

My result should be - ABC,DEF,"",FGHT but I am getting ABC,DEF,FGHT

Upvotes: 23

Views: 100942

Answers (7)

Ashik ali
Ashik ali

Reputation: 17

package com.java.String;

import java.util.StringTokenizer;

public class StringWordReverse {

    public static void main(String[] kam) {
        String s;
        String sReversed = "";
        System.out.println("Enter a string to reverse");
        s = "THIS IS ASHIK SKLAB";
        StringTokenizer st = new StringTokenizer(s);


        while (st.hasMoreTokens()) {
            sReversed = st.nextToken() + " " + sReversed;
        }

        System.out.println("Original string is : " + s);
        System.out.println("Reversed string is : " + sReversed);

    }
}

Output:

Enter a string to reverse

Original string is : THIS IS ASHIK SKLAB

Reversed string is : SKLAB ASHIK IS THIS

Upvotes: 0

Justin Gorny
Justin Gorny

Reputation: 41

Here is a way to split a string into tokens (a token is one or more letters)

public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);
    String s = scan.nextLine();
    s = s.replaceAll("[^A-Za-z]", " ");
    StringTokenizer arr = new StringTokenizer(s, " ");
    int n = arr.countTokens();
    System.out.println(n);
    while(arr.hasMoreTokens()){
        System.out.println(arr.nextToken());
    }
    scan.close();
}

Upvotes: 0

Here is another way to solve this problem

   String str =  "ABC|DEF||FGHT";
   StringTokenizer s = new StringTokenizer(str,"|",true);
   String currentToken="",previousToken="";


   while(s.hasMoreTokens())
   {
    //Get the current token from the tokenize strings
     currentToken = s.nextToken();

    //Check for the empty token in between ||
     if(currentToken.equals("|") && previousToken.equals("|"))
     {
        //We denote the empty token so we print null on the screen
        System.out.println("null");
     }

     else
     {
        //We only print the tokens except delimiters
        if(!currentToken.equals("|"))
        System.out.println(currentToken);
     }

     previousToken = currentToken;
   }

Upvotes: 2

sfussenegger
sfussenegger

Reputation: 36095

Use the returnDelims flag and check two subsequent occurrences of the delimiter:

String str = "ABC|DEF||FGHT";
String delim = "|";
StringTokenizer tok = new StringTokenizer(str, delim, true);

boolean expectDelim = false;
while (tok.hasMoreTokens()) {
    String token = tok.nextToken();
    if (delim.equals(token)) {
        if (expectDelim) {
            expectDelim = false;
            continue;
        } else {
            // unexpected delim means empty token
            token = null;
        }
    }

    System.out.println(token);
    expectDelim = true;
}

this prints

ABC
DEF
null
FGHT

The API isn't pretty and therefore considered legacy (i.e. "almost obsolete"). Use it only with where pattern matching is too expensive (which should only be the case for extremely long strings) or where an API expects an Enumeration.

In case you switch to String.split(String), make sure to quote the delimiter. Either manually ("\\|") or automatically using string.split(Pattern.quote(delim));

Upvotes: 16

Ryan Emerle
Ryan Emerle

Reputation: 15811

StringTokenizer ignores empty elements. Consider using String.split, which is also available in 1.4.

From the javadocs:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

Upvotes: 9

Desintegr
Desintegr

Reputation: 7090

From StringTokenizer documentation :

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

The following code should work :

String s = "ABC|DEF||FGHT";
String[] r = s.split("\\|");

Upvotes: 63

Omry Yadan
Omry Yadan

Reputation: 33646

you can use the constructor that takes an extra 'returnDelims' boolean, and pass true to it. this way you will receive the delimiters, which will allow you to detect this condition.

alternatively you can just implement your own string tokenizer that does what you need, it's not that hard.

Upvotes: 2

Related Questions