Reputation: 323
I have a text file having |
(pipe) as the separator. If I am reading a column and the column itself also contains |
then it while separating another column is created.
Example :
name|date|age
zzz|20-03-22|23
"xx|zz"|23-23-33|32
How can I escape the character within the double quotes ""
how to escape the regular expression used in the split, so that it works for user-specified delimiters
i have tried
String[] cols = line.split("\|");
System.out.println("lets see column only=="+cols[1]);
Upvotes: 1
Views: 4910
Reputation: 421220
How can I escape the character within the double quotes ""
Here's one approach:
String str = "\"xx|zz\"|23-23-33|32";
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find())
m.appendReplacement(sb, m.group().replace("|", "\\\\|"));
m.appendTail(sb);
System.out.println(sb); // prints "xx\|zz"|23-23-33|32
In order to get the columns back you'd do something like this:
String str = "\"xx\\|zz\"|23-23-33|32";
String[] cols = str.split("(?<!\\\\)\\|");
for (String col : cols)
System.out.println(col.replace("\\|", "|"));
Regarding your edit:
how to escape the regular expression used in the split, so that it works for user-specified delimiters
You should use Pattern.quote
on the string you want to split on:
String[] cols = line.split(Pattern.quote(delimiter));
This will ensure that the split works as intended even if delimiter
contains special regex-symbols such as .
or |
.
Upvotes: 3
Reputation: 63708
Here is one way to parse it
String str = "zzz|20-03-22|23 \"xx|zz\"|23-23-33|32";
String regex = "(?<=^|\\|)(([^\"]*?)|([^\"]+\"[^\"]+\".*?))(?=\\||$)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
Output:
zzz
20-03-22
23 "xx|zz"
23-23-33
32
Upvotes: 0
Reputation: 597362
You can replace it with its unicode sequence (prior to delimiting with pipe)
But what you should do is adjust your parser to take that into account, rather than changing the files.
Upvotes: 1