Reputation: 391
I'm looking for a regex that will search in NotePad++, the following format where the [HERE]
should NOT contain any double quotes at all:
,"[HERE]",
Basically, I'm looking to find any additional quotes between commas in a quoted field.
Invalid: ,"hello "there"",
Invalid: ,"hello there"",
Invalid: ,"hell "o there",
Invalid: ,"""""""",
VALID: ,"hello there",
VALID: ,"",
I've tried all kinds of examples and tried making my own but just can't get my head around this.
The closest I've come is:
("[^",]+)"([^",]+")"
Demo: http://regexr.com/3enk2
but this will only match explicit examples such as ,"Example" Place"",
and not others such as ,"Example",
Any help appreciated!
Upvotes: 0
Views: 742
Reputation: 14047
To find correctly balanced quotes, search for ,"[^"]*",
to find unexpected quotes search for ,"[^",]*("[^",]*)+",
.
Note the commas within the square brackets for invalid quotes. That may be wrong, but if it is wrong then you would need stronger rules about the presence of commas.
To explain the regular expressions for valid and invalid. Both start and finish with ,"
and ",
. That deals with the characters surrounding the [HERE]
text shown in the question. The rest of both regular expressions handles the contents of the [HERE]
. The valid case is zero or more characters that are not a quote. This is a simple match for [^"]
. The invalid case has 1 or more quotes which can have other non-quote characters on either side. Invalid examples of the [HERE]
include xx"xx
and xxx"x"xxxxx"xx"
and "xx""xx"
. All these invalid cases can be described as
In a regular expression a character that is not a quote is [^"]
. Zero or more of them is [^"]*
. A sequence of things is enclosed in brackets and one or more of a sequence is (...)+
or in this case ("[^"]*)+
.
The question does not specify how commas with the [HERE]
should be treated. This answer assumes that they are not allowed. It make that clear by adding a comma into the "not a quote" terms, giving [^",]
.
Assembling the pieces of the invalid match we get
," // Opening characters
[^",] // Character that is neither quote nor comma
* // zero or more of them
( // Enclose the sequence
" // a real quote
[^",]* // Zero or more characters that are neither quote nor comma
) // End of the sequence
+ // one or more of the sequence
", // Closing characters
Upvotes: 2
Reputation: 785216
You can use this regex with anchors and negated character class to allow a line staring and ending with comma and containing non-comma, non-double-quote content in between:
^,"[^",]*",$
Upvotes: 1