user278064
user278064

Reputation: 10170

Java Western + Arabic String concatenation issues

I'm having trouble in concatenating pieces of text mixing Western and Arabic chars.

I've a list of tokens like this:

-LRB-
دریای
مازندران
-RRB-
,

I use the following procedure to concatenate these list of tokens:

String str = "";
for (String tok : tokens) {
    str += tok + " ";
}

This is the output of my procedure:

-LRB- دریای مازندران -RRB- , 

As can be seen, the position of the Arabic words is inverted. How can I solve this (maybe suggesting to Java to ignore the information about text direction)?

EDIT

Actually, it seems that my problem was a false problem. Now I've a new one. I need to wrap each word inside a string like this (word *) so that my output will be like this:

(word1 *)(word2 *)(word3 *)...

The procedure that I use is the following:

String str = "";
for (String tok : tokens) {
    str += "(" + tok + "*)";
}

However, the result that I got is this:

(-LRB- *)(دریای *)(مازندران *)(-RRB- *)(, *)

instead of:

(-LRB- *)(دریای)(* مازندران *)(-RRB- *)(, *)

** EDIT2 ** Actually, I've discovered that my problem is not a problem. I wrote my string on a file and I opened it with nano (in the console). And it was correctly concatenated.

So the problem was due to the Eclipse console (and also gedit) which --let's say-- incorrectly rendered the string.

Anyway, thanks for your help!

Upvotes: 2

Views: 1529

Answers (2)

CodeChimp
CodeChimp

Reputation: 8154

First, I would suggest using StringBuilder instead of raw String concatination. You will make your Garbage Collector a lot happier. Second, not seeing the input or how your StringTokenizer is setup, I would venture a guess that it seems like you are having problems tokenizing the string properly.

Upvotes: 0

Brent Ramerth
Brent Ramerth

Reputation: 315

The output is correct, and if you are presenting this text to an Arabic-speaking user you should not override the directionality of the text. Arabic is written from right to left. When you concatenate two Arabic strings together, the first will appear to the right of the second. This is controlled by the BiDi algorithm, the details of which are covered in http://www.unicode.org/reports/tr9/.

Upvotes: 2

Related Questions