dnwilson
dnwilson

Reputation: 147

What is the best way to delimit a csv files thats contain commas and double quotes?

Lets say I have the following string and I want the below output without requiring csv.

this, "what I need", to, do, "i, want, this", to, work

this
what i need
to
do
i, want, this
to
work

Upvotes: 1

Views: 61

Answers (1)

zx81
zx81

Reputation: 41838

This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."

We can solve it with a beautifully-simple regex:

"([^"]+)"|[^, ]+

The left side of the alternation | matches complete "quotes" and captures the contents to Group1. The right side matches characters that are neither commas nor spaces, and we know they are the right ones because they were not matched by the expression on the left.

Option 2: Allowing Multiple Words

In your input, all tokens are single words, but if you also want the regex to work for my cat scratches, "what I need", your dog barks, use this:

"([^"]+)"|[^, ]+(?:[ ]*[^, ]+)*

The only difference is the addition of (?:[ ]*[^, ]+)* which optionally adds spaces + characters, zero or more times.

This program shows how to use the regex (see the results at the bottom of the online demo):

subject = 'this, "what I need", to, do, "i, want, this", to, work'
regex = /"([^"]+)"|[^, ]+/
# put Group 1 captures in an array
mymatches = []
subject.scan(regex) {|m|
     $1.nil? ? mymatches << $& : mymatches << $1
}
mymatches.each { |x| puts x }

Output

this
what I need
to
do
i, want, this
to
work

Reference

Upvotes: 4

Related Questions