frazman
frazman

Reputation: 33223

how to parse a string using regex?

I am trying to parse the following string

 s1 = """ "foo","bar", "foo,bar" """

And out put of this parsing I am hoping is...

 List ["foo","bar","foo,bar"] length 3

I am able to parse the following

s2 = """ "foo","bar", 'foo,bar' """

By using the following pattern

pattern = "(('[^']*')|([^,]+))"
re.findall(pattern,s2)
gives [('foo', '', 'foo'), ('bar', '', 'bar'), ("'foo,bar'", "'foo,bar'", '')]

But I am not able to figure out the pattern for s2.. Note that I need to parse both s1 and s2 successfully

Edit
   The current pattern support strings like
   "foo,bar,foo bar" => [foo,bar,foo bar]
   "foo,bar,'foo bar'" => ["foo","bar",'foo bar']
    "foo,bar,'foo, bar'" => [foo,bar, 'foo, bar'] #length 3

Upvotes: 0

Views: 108

Answers (3)

tmrlvi
tmrlvi

Reputation: 2361

I think that shlex (simple lexical analysis) is much simpler solution here (when regex is too complicated). Specifically, I'd use:

>>> import shlex
>>> lex = shlex.shlex(""" "foo","bar", 'foo,bar' """, posix=True)
>>> lex.whitespace = ','        # Only comma will be a splitter
>>> lex.whitespace_split=True   # Split by any delimiter defined in whitespace
>>> list(lex)                   # It is actually an generator
['foo', 'bar', 'foo,bar']

Edit:

I have a feeling that you're trying to read a csv file. Did you try import csv?

Upvotes: 4

aliteralmind
aliteralmind

Reputation: 20163

This works:

(?:"([^"]+)"|'([^']+)')

Regular expression visualization

Debuggex Demo

Capture groups 1 or two contain the desired output. So each element could be $1$2, because exactly one will always be empty.


Updated to the new requirements as in the comments to Haidro's answer:

(?:("[^"]+")|('[^']+')|(\w+))

Regular expression visualization

Debuggex Demo

Each element is now $1$2$3.

Upvotes: 1

TerryA
TerryA

Reputation: 59974

Maybe you could use something like this:

>>> re.findall(r'["|\'](.*?)["|\']', s1)
['foo', 'bar', 'foo,bar']
>>> re.findall(r'["|\'](.*?)["|\']', s2)
['foo', 'bar', 'foo,bar']

This finds all the words inside of "..." or '...' and groups them.

Upvotes: 2

Related Questions