Reputation: 323
I have a file that is formatted like this:
"A", "B", "test "C"", "D"
I'm trying to get this output with a regular expression:
A, B, test "C", D
I'm trying to remove "outside" quotation marks
This is my regular expression: ("(.*?)",|,"(.*?)")
but it doesn't work properly if a string is formatted like this "test "C""
Upvotes: 1
Views: 3098
Reputation: 4089
Regex is generally very poor at handling nested patterns such as quotes, but in the case of only capturing the outermost pair of quotes we can rely on greediness to work.
s/(?:"([^,]*)")/\1/g
https://regex101.com/r/olTWpF/1
Your approach had some good ideas, but using the reluctant modifier *?
instead of *
meant that your pattern would match the first closing quote it came to. My solution greedily captures any non-delimiting (non-comma) character before matching a closing quote. This means that the pattern will accept and skip over interior quotes.
@pariesz has correctly pointed out that this regex will face issue with commas inside the quoted data.
Upvotes: 1
Reputation: 3426
This should work:
(^"|")(.*?)(", |"$)
With the following substitution
$2,
https://regex101.com/r/daPqGa/1
Upvotes: 0