Jim
Jim

Reputation: 19552

What is a good way to split strings here?

I have the following string:
A:B:1111;domain:80;a;b
The A is optional so B:1111;domain:80;a;b is also valid input.
The :80 is optional as well so B:1111;domain;a;b or :1111;domain;a;b are also valid input
What I want is to end up with a String[] that has:

s[0] = "A";  
s[1] = "B";  
s[2] = "1111";  
s[3] = "domain:80"  
s[4] = "a"  
s[5] = "b"  

I did this as follows:

List<String> tokens = new ArrayList<String>();  
String[] values = s.split(";");  
String[] actions = values[0].split(":");   

for(String a:actions){  
    tokens.add(a);  
}  
//Start from 1 to skip A:B:1111
for(int i = 1; i < values.length; i++){  
    tokens.add(values[i]);  
}  
String[] finalResult = tokens.toArray();

I was wondering is there a better way to do this? How else could I do this more efficiently?

Upvotes: 7

Views: 1995

Answers (5)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

There are not many efficiency concerns here, all I see is linear.

Anyway, you could either use a regular expression or a manual tokenizer.

You can avoid the list. You know the length of values and actions, so you can do

String[] values = s.split(";");  
String[] actions = values[0].split(":");
String[] result = new String[actions.length + values.length - 1];
System.arraycopy(actions, 0, result, 0, actions.legnth);
System.arraycopy(values, 1, result, actions.length, values.length - 1);
return result;

It should be reasonably efficient, unless you insist on implementing split yourself.

Untested low-level approach (make sure to unit test and benchmark before use):

// Separator characters, as char, not string.
final static int s1 = ':';
final static int s2 = ';';
// Compute required size:
int components = 1;
for(int p = Math.min(s.indexOf(s1), s.indexOf(s2));
  p < s.length() && p > -1;
  p = s.indexOf(s2, p+1)) {
    components++;
}
String[] result = new String[components];
// Build result
int in=0, i=0, out=Math.min(s.indexOf(s1), s.indexOf(s2));
while(out < s.length() && out > -1) {
  result[i] = s.substring(in, out);
  i++;
  in = out + 1;
  out = s.indexOf(s2, in);
}
assert(i == result.length - 1);
result[i] = s.substring(in, s.length());
return result;

Note: this code is optimized in the crazy way of that it will consider a : only in the first component. Handling the last component is a bit tricky, as out will have the value -1.

I would usually not use this last approach, unless performance and memory is extremely crucial. Most likely there are still some bugs in it, and the code is fairly unreadable, in particulare compare to the one above.

Upvotes: 2

If you want to keep the domain and port together, then I believe that you will need you will need two splits. You may be able to do it with some regex magic, but I would doubt that you will see any real performance gain from it.

If you do not mind splitting the domain and port, then:

  String s= "A:B:1111;domain:80;a;b";
  List<String> tokens = new ArrayList<String>();
  String[] values = s.split(";|:");

  for(String a : values){
      tokens.add(a);
  }

Upvotes: 0

Ina
Ina

Reputation: 4470

With some assumptions about acceptable characters, this regex provides validation as well as splitting into the groups you desire.

Pattern p = Pattern.compile("^((.+):)?(.+):(\\d+);(.+):(\\d+);(.+);(.+)$");
Matcher m = p.matcher("A:B:1111;domain:80;a;b");
if(m.matches())
{
    for(int i = 0; i <= m.groupCount(); i++)
        System.out.println(m.group(i));
}
m = p.matcher("B:1111;domain:80;a;b");
if(m.matches())
{
    for(int i = 0; i <= m.groupCount(); i++)
        System.out.println(m.group(i));
}

Gives:

A:B:1111;domain:80;a;b // ignore this
A: // ignore this
A // This is the optional A, check for null
B
1111
domain
80
a
b

And

B:1111;domain:80;a;b // ignore this
null // ignore this
null // This is the optional A, check for null
B
1111
domain
80
a
b

Upvotes: 1

Ashwinee K Jha
Ashwinee K Jha

Reputation: 9307

Unless this is a bottleneck in your code and you have verified that don't worry much about efficiency as the logic here is reasonable. You can avoid creating the temporary array list and instead directly create the array as you know the required size.

Upvotes: 0

newSpringer
newSpringer

Reputation: 1028

you could do something like

String str = "A:B:1111;domain:80;a;b";
String[] temp;

/* delimiter */
String delimiter = ";";
/* given string will be split by the argument delimiter provided. */
temp = str.split(delimiter);
/* print substrings */
for(int i =0; i < temp.length ; i++)
System.out.println(temp[i]);

Upvotes: 0

Related Questions