gbs
gbs

Reputation: 7266

Need explanation on this regex

I have this regex used to split a string:

,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)

e.g. string

"Field1","Field2","item1,item2,item3","Hello,""John"""

The one thing I understand is it is splitting the string on , but anything after that I am not sure.

If anyone can explain this Regex please.

If you can dissect it to the simplest possible level, I would appreciate it.

Upvotes: 1

Views: 606

Answers (2)

anubhava
anubhava

Reputation: 785058

This regex is matching a comma , only if it is outside double quotes by counting even number of quotes after literal ,.


Explanation:

, -> match literal comma
(?=...) -> positive lookahead
[^"]*" -> match anything before a " followed by a literal "
[^"]*"[^"]*" -> match a pair of above 
(?:[^"]*"[^"]*")* -> Match 0 or more of pairs (0, 2, 4, 6 sets)
[^"]*$ -> Followed by any non-quote till end of string

Example Input:

"Field1,Field2","Field3","item1,item2,item3"
  • First it will match , before "Field3" because lookahead: (?=(?:[^"]*"[^"]*")*[^"]*$) is making sure there are 4 double quotes after this comma.
  • Second it will match , after "Field3" because lookahead: (?=(?:[^"]*"[^"]*")*[^"]*$) is making sure there are 2 double quotes after this comma.
  • It is not matching comma between Field1 and Field2 because # of quotes after that are odd in numbers and hence lookahead (?=(?:[^"]*"[^"]*")*[^"]*$) will fail.

Upvotes: 4

vks
vks

Reputation: 67968

,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)

This will not split on , which are inside " and ".This says that after every , there will be groups of something " something".So effectively , cannot be in between " and ".

Upvotes: 3

Related Questions