justin3250
justin3250

Reputation: 323

Escape special characters in java

I have a text file having | (pipe) as the separator. If I am reading a column and the column itself also contains | then it while separating another column is created.

Example :

name|date|age
zzz|20-03-22|23
"xx|zz"|23-23-33|32

How can I escape the character within the double quotes "" how to escape the regular expression used in the split, so that it works for user-specified delimiters i have tried String[] cols = line.split("\|"); System.out.println("lets see column only=="+cols[1]);

Upvotes: 1

Views: 4910

Answers (4)

aioobe
aioobe

Reputation: 421220

How can I escape the character within the double quotes ""

Here's one approach:

String str = "\"xx|zz\"|23-23-33|32";

Matcher m = Pattern.compile("\"[^\"]*\"").matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find())
    m.appendReplacement(sb, m.group().replace("|", "\\\\|"));

m.appendTail(sb);

System.out.println(sb);  // prints "xx\|zz"|23-23-33|32

In order to get the columns back you'd do something like this:

String str = "\"xx\\|zz\"|23-23-33|32";
String[] cols = str.split("(?<!\\\\)\\|");

for (String col : cols)
    System.out.println(col.replace("\\|", "|"));

Regarding your edit:

how to escape the regular expression used in the split, so that it works for user-specified delimiters

You should use Pattern.quote on the string you want to split on:

String[] cols = line.split(Pattern.quote(delimiter));

This will ensure that the split works as intended even if delimiter contains special regex-symbols such as . or |.

Upvotes: 3

Prince John Wesley
Prince John Wesley

Reputation: 63708

Here is one way to parse it

    String str = "zzz|20-03-22|23 \"xx|zz\"|23-23-33|32";
    String regex = "(?<=^|\\|)(([^\"]*?)|([^\"]+\"[^\"]+\".*?))(?=\\||$)";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(str); 
    while(m.find()) {
        System.out.println(m.group());
    }   

Output:

zzz
20-03-22
23 "xx|zz"
23-23-33
32

Upvotes: 0

Emmanuel Bourg
Emmanuel Bourg

Reputation: 11058

You can use a CSV parser like OpenCSV ou Commons CSV

Upvotes: 1

Bozho
Bozho

Reputation: 597362

You can replace it with its unicode sequence (prior to delimiting with pipe)

But what you should do is adjust your parser to take that into account, rather than changing the files.

Upvotes: 1

Related Questions