Aditya
Aditya

Reputation: 1268

Regex against CSV, what am I doing wrong

I am using a regex in less to find the rows where the 6th column is an empty "".

I used the following regex:

^(.*?,){5}"",

But it matches this:

a,b,c,d,e,""

and also matches this:

a,b,c,d,e,f,g,"",

What am I doing wrong?

Upvotes: 0

Views: 70

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

Your regex ^(.*?,){5}"" uses a non greedy part (.*?,) that tries to match as least as possible to get a match and repeat that 5 times. The first 4 times you match a,b,c,d,. At the fifth time it tries to match e, followed by "" but there is no match.

In the fifth repetition this part (.*?,) tries to match any character at least as possible (which can also be a comma itself because the dot matches also a comma) followed by a comma until it encounters a double quote so it matches e,f,g,.

Using a csv parser would be the better option, but if you want to use a regex for your example data you might match not a comma or a line break using a negated character class followed by pattern repeated 4 times that matches a comma followed by again matching not a comma and then match ""

To match following comma separated data you could again match a comma followed by not a comma or a line break zero or more times and assert the end of the string $

^[^,\r\n]+(?:,[^,\r\n]+){4},""(?:,[^,\r\n]+)*$

Upvotes: 1

Related Questions