Reputation: 1268
I am using a regex in less to find the rows where the 6th column is an empty "".
I used the following regex:
^(.*?,){5}"",
But it matches this:
a,b,c,d,e,""
and also matches this:
a,b,c,d,e,f,g,"",
What am I doing wrong?
Upvotes: 0
Views: 70
Reputation: 163362
Your regex ^(.*?,){5}""
uses a non greedy part (.*?,)
that tries to match as least as possible to get a match and repeat that 5 times.
The first 4 times you match a,b,c,d,
. At the fifth time it tries to match e,
followed by ""
but there is no match.
In the fifth repetition this part (.*?,)
tries to match any character at least as possible (which can also be a comma itself because the dot matches also a comma) followed by a comma until it encounters a double quote so it matches e,f,g,
.
Using a csv parser would be the better option, but if you want to use a regex for your example data you might match not a comma or a line break using a negated character class followed by pattern repeated 4 times that matches a comma followed by again matching not a comma and then match ""
To match following comma separated data you could again match a comma followed by not a comma or a line break zero or more times and assert the end of the string $
^[^,\r\n]+(?:,[^,\r\n]+){4},""(?:,[^,\r\n]+)*$
Upvotes: 1