Reputation: 1875
I'm using the following regex in Notepad++ v6.6.8
\n(".*?",){14}""
It matches lines that it should, but it also matches lines that it should not. For example, I do not expect it to match the following line:
"Sold","421","421","67","1/9/2007 12:00:00 AM","","3","","","","","","","","1/9/2007 12:00:00 AM","","","","True","4601","1/3/2011 5:44:17 PM",""
However, it matches up through the second occurrence of the datetime. Out of curiosity, I changed the value within the curly braces to 15
, and it returned exactly the same match. Can someone explain to me why this is? I'm trying to get a quick count of every record in a CSV file where the 15th position is empty (""
), and I think the result is off by a few thousand records.
Upvotes: 0
Views: 42
Reputation: 174756
(".*?",){14}""
does a greedy match. This (".*?",){14}
regex would match the exact 14 occurrences and the regex engine also tries to match the following ""
pattern. But there isn't a ""
after the 14th occurrence of ".*?",
in that place there is a string "1/9/2007 12:00:00 AM",
which is not of ""
, so the regex engine moves upto the next "",
. Once it finds, it matches upto that string in-order to get a match.
See the difference by removing the following ""
from the pattern. Here and here.
Upvotes: 1
Reputation: 25429
According to RegexBuddy, it does match that line. I'm not entirely clear what it should or shouldn't match given the example data but RegExBuddy agrees that it's a match.
Can someone explain to me why this is?
I think it's because (".*?",)
-- specificly, the fact that you can match the quotes without any contents. This will allow any quotes to be counted against the quantifier.
Upvotes: 0
Reputation: 2112
The 14th instance of the first part of the pattern is matching "","1/9/2007 12:00:00 AM"
. Just because the match is not greedy does not mean it won't extend if it needs to in order to make a match.
You might try something like
\n("[^"]*",){14}""
Or use the ^ anchor instead of matching the newline
^("[^"]*",){14}""
Upvotes: 1