tmoore82
tmoore82

Reputation: 1875

Regex Not Adhering to Value in Curly Braces

I'm using the following regex in Notepad++ v6.6.8

\n(".*?",){14}""

It matches lines that it should, but it also matches lines that it should not. For example, I do not expect it to match the following line:

"Sold","421","421","67","1/9/2007 12:00:00 AM","","3","","","","","","","","1/9/2007 12:00:00 AM","","","","True","4601","1/3/2011 5:44:17 PM",""

However, it matches up through the second occurrence of the datetime. Out of curiosity, I changed the value within the curly braces to 15, and it returned exactly the same match. Can someone explain to me why this is? I'm trying to get a quick count of every record in a CSV file where the 15th position is empty (""), and I think the result is off by a few thousand records.

Upvotes: 0

Views: 42

Answers (3)

Avinash Raj
Avinash Raj

Reputation: 174756

(".*?",){14}"" does a greedy match. This (".*?",){14} regex would match the exact 14 occurrences and the regex engine also tries to match the following "" pattern. But there isn't a "" after the 14th occurrence of ".*?", in that place there is a string "1/9/2007 12:00:00 AM", which is not of "", so the regex engine moves upto the next "",. Once it finds, it matches upto that string in-order to get a match.

See the difference by removing the following "" from the pattern. Here and here.

Upvotes: 1

Frank V
Frank V

Reputation: 25429

According to RegexBuddy, it does match that line. I'm not entirely clear what it should or shouldn't match given the example data but RegExBuddy agrees that it's a match.

RegexBuddy showing match

Can someone explain to me why this is?

I think it's because (".*?",) -- specificly, the fact that you can match the quotes without any contents. This will allow any quotes to be counted against the quantifier.

Upvotes: 0

Ryan M
Ryan M

Reputation: 2112

The 14th instance of the first part of the pattern is matching "","1/9/2007 12:00:00 AM". Just because the match is not greedy does not mean it won't extend if it needs to in order to make a match.

You might try something like

\n("[^"]*",){14}""

Or use the ^ anchor instead of matching the newline

^("[^"]*",){14}""

Upvotes: 1

Related Questions