Reputation: 371
I'm having a difficulty splitting a string without removing whitespaces but removing all other non-characters. I have this school task to read in with BufferedReader
and the text consists of lots of characters which even eclipse couldn't show. The elements i read in are in form of element1;element 2; element 3 (Element 4; Element 5 $Element 6 etc.. and one of the delimeters to remove should be ";".
I've tried .split(//W)
but this removed all the whitespaces and some elements stayed completely empty although it removed characters well.
Right now i've used .split("[;(),$]")
but this does not work properly since there are still characters which i can't recognize..
Upvotes: 0
Views: 382
Reputation: 124235
If you claim that \\W
worked fine for you but only problem was that it also split on whitespace then you can use intersection of \\W
and \\S
which will remove all whitespaces from \\W
.
Use split("[\\W&&\\S]+")
Also to remove whitespaces surrounding results like _eleement 3
(where _
represents whitespace) you can surround regex with \\s*
. To add support for Unicode in predefined character class just add (?U)
flag to regex.
Demo:
String data = "element1;element 2; element 3 (Element 4; Element 5 $Element 6 ";
for (String s:data.split("(?U)\\s*[\\W&&\\S]+\\s*")){
System.out.println(s);
}
Output:
element1
element 2
element 3
Element 4
Element 5
Element 6
Upvotes: 0
Reputation: 533530
Instead of trying to split on the all the characters you don't want, you could include all the characters you do want. e.g.
String[] words = s.split("[^ a-zA-Z0-9]+");
Note: the ^
means anything but these characters.
BTW: none of the characters are non-characters.
Upvotes: 1