Shah
Shah

Reputation: 5018

String splitting

I have a string in what is the best way to put the things in between $ inside a list in java?

String temp = $abc$and$xyz$;

how can i get all the variables within $ sign as a list in java [abc, xyz]

i can do using stringtokenizer but want to avoid using it if possible. thx

Upvotes: 0

Views: 751

Answers (8)

Santosh
Santosh

Reputation: 2323

You can use

String temp = $abc$and$xyz$;
String array[]=temp.split(Pattern.quote("$"));
List<String> list=new ArrayList<String>();
for(int i=0;i<array.length;i++){
list.add(array[i]);
}

Now the list has what you want.

Upvotes: 0

Saurabh
Saurabh

Reputation: 7964

You can do it in simple manner writing your own code. Just use the following code and it will do the job for you

import java.util.ArrayList; import java.util.List;

public class MyStringTokenizer {

/**
 * @param args
 */
public static void main(String[] args) {

    List <String> result = getTokenizedStringsList("$abc$efg$hij$");

    for(String token : result)
    {
        System.out.println(token);
    }

}

private static List<String> getTokenizedStringsList(String string) {

    List <String> tokenList = new ArrayList <String> ();

    char [] in = string.toCharArray();

    StringBuilder myBuilder = null;
    int stringLength = in.length;
    int start = -1;
    int end = -1;
    {
        for(int i=0; i<stringLength;)
        {
            myBuilder = new StringBuilder();
            while(i<stringLength && in[i] != '$')
                i++;
            i++;
            while((i)<stringLength && in[i] != '$')
            {

                myBuilder.append(in[i]);
                i++;
            }
            tokenList.add(myBuilder.toString());                
        }
    }
    return tokenList;
}

}

Upvotes: 0

Jay
Jay

Reputation: 27464

Basically I'd ditto Khotyn as the easiest solution. I see you post on his answer that you don't want zero-length tokens at beginning and end.

That brings up the question: What happens if the string does not begin and end with $'s? Is that an error, or are they optional?

If it's an error, then just start with:

if (!text.startsWith("$") || !text.endsWith("$"))
  return "Missing $'s"; // or whatever you do on error

If that passes, fall into the split.

If the $'s are optional, I'd just strip them out before splitting. i.e.:

if (text.startsWith("$"))
  text=text.substring(1);
if (text.endsWith("$"))
  text=text.substring(0,text.length()-1);

Then do the split.

Sure, you could make more sophisticated regex's or use StringTokenizer or no doubt come up with dozens of other complicated solutions. But why bother? When there's a simple solution, use it.

PS There's also the question of what result you want to see if there are two $'s in a row, e.g. "$foo$$bar$". Should that give ["foo","bar"], or ["foo","","bar"] ? Khotyn's split will give the second result, with zero-length strings. If you want the first result, you should split("\$+").

Upvotes: 1

polygenelubricants
polygenelubricants

Reputation: 383686

The pattern is simple enough that String.split should work here, but in the more general case, one alternative for StringTokenizer is the much more powerful java.util.Scanner.

    String text = "$abc$and$xyz$";
    Scanner sc = new Scanner(text);

    while (sc.findInLine("\\$([^$]*)\\$") != null) {
        System.out.println(sc.match().group(1));
    } // abc, xyz

The pattern to find is:

\$([^$]*)\$
  \_____/     i.e. literal $, a sequence of anything but $ (captured in group 1)
     1                 and another literal $

The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.

(…) is used for grouping. (pattern) is a capturing group and creates a backreference.

The backslash preceding the $ (outside of character class definition) is used to escape the $, which has a special meaning as the end of line anchor. That backslash is doubled in a String literal: "\\" is a String of length one containing a backslash).

This is not a typical usage of Scanner (usually the delimiter pattern is set, and tokens are extracted using next), but it does show how'd you use findInLine to find an arbitrary pattern (ignoring delimiters), and then using match() to access the MatchResult, from which you can get individual group captures.

You can also use this Pattern in a Matcher find() loop directly.

    Matcher m = Pattern.compile("\\$([^$]*)\\$").matcher(text);
    while (m.find()) {
        System.out.println(m.group(1));
    } // abc, xyz

Related questions

Upvotes: 4

extraneon
extraneon

Reputation: 23950

I would go for a regex myself, like Riduidel said.

This special case is, however, simple enough that you can just treat the String as a character sequence, and iterate over it char by char, and detect the $ sign. And so grab the strings yourself.

On a side node, I would try to go for different demarkation characters, to make it more readable to humans. Use $ as start-of-sequence and something else as end-of-sequence for instance. Or something like I think the Bash shell uses: ${some_value}. As said, the computer doesn't care but you debugging your string just might :)

As for an appropriate regex, something like (\\$.*\\$)* or so should do. Though I'm no expert on regexes (see http://www.regular-expressions.info for nice info on regexes).

Upvotes: 1

khotyn
khotyn

Reputation: 954

Just try this one:temp.split("\\$");

Upvotes: 1

Mike Q
Mike Q

Reputation: 23219

If you want a simple split function then use Apache Commons Lang which has StringUtils.split. The java one uses a regex which can be overkill/confusing.

Upvotes: 0

Riduidel
Riduidel

Reputation: 22292

Maybe you could think about calling String.split(String regex) ...

Upvotes: 9

Related Questions