Reputation: 33223
I am trying to parse the following string
s1 = """ "foo","bar", "foo,bar" """
And out put of this parsing I am hoping is...
List ["foo","bar","foo,bar"] length 3
I am able to parse the following
s2 = """ "foo","bar", 'foo,bar' """
By using the following pattern
pattern = "(('[^']*')|([^,]+))"
re.findall(pattern,s2)
gives [('foo', '', 'foo'), ('bar', '', 'bar'), ("'foo,bar'", "'foo,bar'", '')]
But I am not able to figure out the pattern for s2.. Note that I need to parse both s1 and s2 successfully
Edit
The current pattern support strings like
"foo,bar,foo bar" => [foo,bar,foo bar]
"foo,bar,'foo bar'" => ["foo","bar",'foo bar']
"foo,bar,'foo, bar'" => [foo,bar, 'foo, bar'] #length 3
Upvotes: 0
Views: 108
Reputation: 2361
I think that shlex
(simple lexical analysis) is much simpler solution here (when regex
is too complicated). Specifically, I'd use:
>>> import shlex
>>> lex = shlex.shlex(""" "foo","bar", 'foo,bar' """, posix=True)
>>> lex.whitespace = ',' # Only comma will be a splitter
>>> lex.whitespace_split=True # Split by any delimiter defined in whitespace
>>> list(lex) # It is actually an generator
['foo', 'bar', 'foo,bar']
Edit:
I have a feeling that you're trying to read a csv file. Did you try import csv
?
Upvotes: 4
Reputation: 20163
This works:
(?:"([^"]+)"|'([^']+)')
Capture groups 1 or two contain the desired output. So each element could be $1$2
, because exactly one will always be empty.
Updated to the new requirements as in the comments to Haidro's answer:
(?:("[^"]+")|('[^']+')|(\w+))
Each element is now $1$2$3
.
Upvotes: 1
Reputation: 59974
Maybe you could use something like this:
>>> re.findall(r'["|\'](.*?)["|\']', s1)
['foo', 'bar', 'foo,bar']
>>> re.findall(r'["|\'](.*?)["|\']', s2)
['foo', 'bar', 'foo,bar']
This finds all the words inside of "..."
or '...'
and groups them.
Upvotes: 2