s_puria
s_puria

Reputation: 407

String.split by semicolon

I want to split a string by semicolon(";"):

String phrase = "‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
String[] dateSplit = phrase.split(";");
System.out.println("dateSplit[0]:" + dateSplit[0]);
System.out.println("dateSplit[1]:" + dateSplit[1]);

But it removes the ";" from string and puts all string to 'datesplit1' so the output is:

dateSplit[0]:‫
dateSplit[1]:‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid`

Demo

and on doing

System.out.println("Real String :"+phrase);

string printed is

Real String :‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid

Upvotes: 9

Views: 24409

Answers (3)

T.Gounelle
T.Gounelle

Reputation: 6033

The phrase contains bi-directional characters like right-to-left embedding. It's why some editors don't manage to display correctly the string.

This piece of code shows the actual characters in the String (for some people the phrase won't display here the right way, but it compiles and looks fine in Eclipse). I just translate left-right with ->, right-to-left with <- and pop directions with ^:

public static void main(String[]args) {
    String phrase = "‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
    String[] dateSplit = phrase.split(";");
    for (String d : dateSplit) {
        System.out.println(d);
    }
    char[] c = phrase.toCharArray();
    StringBuilder p = new StringBuilder();
    for (int i = 0; i < c.length;i++) {
        int code = Character.codePointAt(c, i);
        switch (code) {
        case 8234:
            p.append(" -> ");
            break;
        case 8235:
            p.append(" <- ");
            break;
        case 8236:
            p.append(" ^ ");
            break;
        default:
            p.append(c[i]);
        }
    }
    System.out.println(p.toString());
}

Prints:

<- ; -> 14/May/2015 ^ ^ <- -> FC ^ ^ <- -> Barcelona ^ ^ <- -> VS. ^ ^ <- -> Real ^ ^ <- -> Madrid

The String#split() will work on the actual character string and not on what the editor displays, hence you can see the ; is the second character after a right-to-left, which gives (beware of display again: the ; is not part of the string in dateSplit[1]):

dateSplit[0] = "";
dateSplit[1] = "14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";

I guess you are processing data from a language writing/reading from right-to-left and there is some mixing with the football team names which are left-to-right. The solution is certainly to get rid of directional characters and put the ; at the right place, i.e as a separator for the token.

Upvotes: 11

Naman Gala
Naman Gala

Reputation: 4692

I rewrote your code, instead of coping from here and its working perfectly fine.

public static void main(String[] args) {
    String phrase = "14/May/2015; FC Barcelona VS. Real Madrid";
    String[] dateSplit = phrase.split(";");
    System.out.println("dateSplit[0]:" + dateSplit[0]);
    System.out.println("dateSplit[1]:" + dateSplit[1]);
}

Demo

Upvotes: 1

Steve Chaloner
Steve Chaloner

Reputation: 8202

Cut and pasting your code into IntelliJ screwed up the editor; as @Palcente said, possible encoding issues.

However, I would recommend usinge a StringTokenizer instead.

StringTokenizer sTok = new StringTokenizer(phrase, ";");

You can then iterate over it, which leads to nicer (and safer) code.

Upvotes: 0

Related Questions