Marquinio
Marquinio

Reputation: 473

String split not returning empty results

I'm trying to use

"value1:value2::value3".split(":");

Problem is that I want it to include the blank results.

It returns: [value1, value2, value3]
It should be: [value1, value2, , value3]

Does anyone know the regexp to fix this?

Ok I found cause of problem. I'm actually reading a text file and it contains this line:

123:;~\&:ST02:M:test:M:4540145::type;12:51253:D:2.2:567766::AL:::::::2.2b

When I process this line reading the text file it produces the erroneous result mentioned above, which is it doesn't include any empty results in cases like this: :::::.

But when I use the above line in a test program it doesn't compile and I get a "invalid escape sequence". I think its because of the "\&".

Is there a workaround to this problem by using a regular expression?

Upvotes: 8

Views: 11419

Answers (9)

casablanca
casablanca

Reputation: 70721

split does include empty matches in the result, have a look at the docs here. However, by default, trailing empty strings (those at the end of the array) are discarded. If you want to include these as well, try split(":", -1).

Upvotes: 20

ChuckCottrill
ChuckCottrill

Reputation: 4444

This works,

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;

public class split {
public static void main(String[] args)
{
    String data = null;
    try {
    BufferedReader br = new BufferedReader(new FileReader(new File("split.csv")));
    while( (data=br.readLine())!=null)
    {
        System.out.println("line:"+data);
        String[] cols = data.split(":",-1);
        System.out.println("count:"+cols.length);
        for(int x=0;x<cols.length;++x)
        {
            System.out.println("["+x+"] =("+cols[x]+")");
        }
    }
    } catch (IOException e) {
        e.printStackTrace();
    }
}
}

Here is a test file,

a:b:c:d:e
a:b:c:d:
a:b:c::
a:b::: 
a::::
::::
::::e
:::d:e
::c:d:e
:b:c:d:e
a:b:c:d:e

Upvotes: 0

AHungerArtist
AHungerArtist

Reputation: 9609

That should work but give StringTokenizer a go if you're still having issues.

Upvotes: 0

mezmo
mezmo

Reputation: 2480

I think that a StringTokenizer might work better for you, YMMV.

Upvotes: 2

ColinD
ColinD

Reputation: 110104

Using Guava's Splitter class:

Iterable<String> split = Splitter.on(':').split("value1:value2::value3");

Splitter does not omit empty results by default, though you can make one that does. Though it seems from what others are saying that what you're doing should work as well.

Upvotes: 1

f1sh
f1sh

Reputation: 11942

public static void main(String[] args){
  String[] arr = "value1:value2::value3".split(":");
  for(String elm:arr){
    System.out.println("'"+elm+"',");
  }
  System.out.println(arr.length);
}

prints

'value1',
'value2',
'',
'value3',
4

Which is exactly what you want. Your mistake is somewhere else...

Upvotes: 1

Matthew Flynn
Matthew Flynn

Reputation: 2238

Use a negative limit in your split statement:

String str = "val1:val2::val3";
String[] st = str.split(":", -1);
for (int i = 0; i< st.length; i++)
    System.out.println(st[i]);

Results:

val1
val2

val3

Upvotes: 1

crazyscot
crazyscot

Reputation: 12019

Works for me.

class t {
    public static void main(String[] _) {
        String t1 = "value1:value2::value3";
        String[] t2 = t1.split(":");
        System.out.println("t2 has "+t2.length+" elements");
        for (String tt : t2) System.out.println("\""+tt+"\"");
    }
}

gives the output

$ java t
t2 has 4 elements
"value1"
"value2"
""
"value3"

Upvotes: 4

Bill K
Bill K

Reputation: 62789

I don't honestly see the big draw of split. StringTokenizer works just as well for most things like this and will easily send back the tokens (so you can tell there was nothing in between :: ).

I just wish it worked a little better with the enhanced for loop, but that aside, it wouldn't hurt to give it a try.

I think there is a regexp trick to get your matched tokens to return as well but I've gone 20 years without learning regexp and it's still never been the best answer to any problem I've tackled (Not that I would actually know since I don't ever use it, but the non-regexp solutions are generally too easy to beat.)

Upvotes: 1

Related Questions