Student
Student

Reputation: 349

Why the space appears as sub string in this split instruction?

I have string with spaces and some non-informative characters and substrings required to be excluded and just to keep some important sections. I used the split as below:

String myString[]={"01: Hi       you look tired today?  Can I help you?"};
myString=myString[0].split("[\\s+]");// Split based on any white spaces
for(int ii=0;ii<myString.length;ii++) 
    System.out.println(myString[ii]);

The result is :

01:
Hi






you
look
tired
today?

Can
I
help
you?

The spaces appeared after the split as sub strings when the regex is “[\s+]” but disappeared when the regex is "\s+". I am confused and not able to find answer in the related stack overflow pages. The link regex-Pattern made me more confused. Please help, I am new with java.

19/1/2015:Edit

After your valuable advice, I reached to point in my program where a conditional statements is required to be decomposed and processed. The case I have is:

String s1="01:IF   rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\,]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);

The result is fine till now as:

01:IF
rd.h
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
with
0.4610;

My next step is to add string "with" to the regex and get rid of this word while doing the split. I tried it this way:

String s1="01:IF   rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\, with]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);

The result not perfect, because I got unwonted extra split at every "h" letter as:

01:IF
rd.
dq.L
o.LL
v.L
THEN
la.VHB  
av.VHR
0.4610;

Any advice on how to specify string with mixed white spaces and separation marks? Many thanks.

Upvotes: 1

Views: 138

Answers (2)

Bohemian
Bohemian

Reputation: 425188

Your biggest problem is not understanding enough about regular expressions to write them properly. One key point you don't comprehend is that [...] is a character class, which is a list of characters any one of which can match. For example:

  • [abc] matches either a, b or c (it does not match "abc")
  • [\\s+] matches any whitespace or "+" character
  • [with] matches a single character that is either w, i, t or h
  • [.$&^?] matches those literal characters - most characters lose their special regex meaning when in a character class

To split on any number of whitespace, comma and ampersand and consume "with" (if it appears), do this:

String [] s2 = s1.split("[\\s,&]+(with[\\s,&]+)?");

You can try it easily here Online Regex and get useful comments.

Upvotes: 0

1010
1010

Reputation: 1848

inside square brackets, [\s+] will represent the whitespace character class with the plus sign added. it is only one character so a sequence of spaces will split many empty strings as Todd noted, and will also use + as separator.

you should use \s+ (without brackets) as the separator. that means one or more whitespace characters.

myString=myString[0].split("\\s+");

Upvotes: 2

Related Questions