Reputation: 2582
Let's say I have a file of strings like
11,"abc","def"
12,"ab "c"","def" // invalid
13,"ab,"c"","def" // invalid
14,""a" b,c","def" // invalid
15,""a", "b"c","def" // invalid
As you can see some of the double quotes are unescaped. I'd like to filter out invalid strings before I try to parse them.
I'm thinking to do something like \,\".+\"\,
to find a token and then to check that it doesn't contain ","
inside. But I can't figure out how to make it work.
I've searched in SO but haven't found an answer which works for me.
Thank you.
Upvotes: 1
Views: 352
Reputation: 5385
If String always start and end with "
, you can try with this Java regex:
(?<=,\s{0,99}"|(?!\A)\G)[^"]+|(?<=(?!\A)\G|")(")(?!\s*[,\n]|$)
the group 1 capture invalid quotes, you can get the indices with matcher.start(1)
and matcher.end(1)
. \s{0,99}
will work only in Java.
Upvotes: 1