Reputation: 75
I need a regex to count the number of columns in a pipe delimited string in java. The column data will always be enclosed by double quotes or it will be empty.
eg:
"1234"|"Name"||"Some description with ||| in it"|"Last Column"
The above should be counted as 5 columns including one empty column after "Name" column.
Thanks
Upvotes: 6
Views: 1819
Reputation: 930
Here's a regex I used a while back that also deals with escaped quotes AND escaped delimiters. It's probably overkill for your requirements (counting columns) but perhaps it'll help you or someone else in the future with their parsing.
(?<=^|(?<!\\)\|)(\".*?(?<=[^\\])\"|.*?(?<!\\(?=\|))(?=")?|)(?=\||$)
and broken down as:
(?<=^|(?<!\\)\|) // look behind to make sure the token starts with the start anchor (first token) or a delimiter (but not an escaped delimiter)
( // start of capture group 1
\".*?(?<=[^\\])\" // a token bounded by quotes
| // OR
.*?(?<!\\(?=\|))(?=")? // a token not bounded by quotes, any characters up to the delimiter (unless escaped)
| // OR
// empty token
) // end of capture group 1
(?=\||$) // look ahead to make sure the token is followed by either a delimiter or the end anchor (last token)
when you actually use it it'll have to be escaped as:
(?<=^|(?<!\\\\)\\|)(\\\".*?(?<=[^\\\\])\\\"|.*?(?<!\\\\(?=\\|))(?=\")?|)(?=\\||$)
It's complicated, but there's method to this madness: Other regular expressions I googled would fall over if either a column at the start or end of the line was empty, delimited quotes were in odd places, the line or column started or ended with an escaped delimiter, and a bunch of other edge-case scenarios.
The fact that you're using a pipe as a delimiter makes this regex even more difficult to read/understand. A tip is where you see a pipe by itself "|", it's a conditional OR in regex, and when it's escaped "\|", it's your delimiter.
Upvotes: 1
Reputation: 33908
Slightly improved the expressions in aioobe's answer:
int cols = input.replaceAll("\"(?:[^\"\\]+|\\.)*\"|[^|]+", "")
.length() + 1;
Handles escapes in quotes, and uses a single expression to remove everything except the delimiters.
Upvotes: 2
Reputation: 421020
Here's one way to do it:
String input =
"\"1234\"|\"Name\"||\"Some description with ||| in it\"|\"Last Column\"";
// \_______/ \______/\/\_________________________________/ \_____________/
// 1 2 3 4 5
int cols = input.replaceAll("\"[^\"]*\"", "") // remove "..."
.replaceAll("[^|]", "") // remove anything else than |
.length() + 1; // Count the remaining |, add 1
System.out.println(cols); // 5
IMO it's not very robust though. I wouldn't recommend using regular expressions if you plan on handling escaped quotes, for instance.
Upvotes: 8