skilty
skilty

Reputation: 1

Split Python String by single quotation marks

I have a string like this:

text = ['Adult'   'Adverse Drug Reaction Reporting Systems/*classification'   '*Drug-Related Side Effects and Adverse Reactions'   'Hospital Bed Capacity   300 to 499'   'Hospitals   County'   'Humans'   'Indiana'   'Pharmacy Service   Hospital/*statistics & numerical data']

I need to separate this string, where each category (separated by the single quotaions marks is stored in an array). For example:

text = Adult, Adverse Drug Reaction Reporting Systems...

I have experimented with the split function but am unsure how to do it.

Upvotes: 0

Views: 811

Answers (1)

mgilson
mgilson

Reputation: 310227

You can do something like this with regex assuming that you don't have constraints that you haven't listed:

>>> s = "'Adult'   'Adverse Drug Reaction Reporting Systems/*classification'   '*Drug-Related Side Effects and Adverse Reactions'   'Hospital Bed Capacity   300 to 499'   'Hospitals   County'   'Humans'   'Indiana'   'Pharmacy Service   Hospital/*statistics & numerical data'"
>>> import re
>>> regex = re.compile(r"'[^']*'")
>>> regex.findall(s)
["'Adult'", "'Adverse Drug Reaction Reporting Systems/*classification'", "'*Drug-Related Side Effects and Adverse Reactions'", "'Hospital Bed Capacity   300 to 499'", "'Hospitals   County'", "'Humans'", "'Indiana'", "'Pharmacy Service   Hospital/*statistics & numerical data'"]

My regex is leaving the ' in the strings -- You can easily remove them with a str.strip("'").

>>> [x.strip("'") for x in regex.findall(s)]
['Adult', 'Adverse Drug Reaction Reporting Systems/*classification', '*Drug-Related Side Effects and Adverse Reactions', 'Hospital Bed Capacity   300 to 499', 'Hospitals   County', 'Humans', 'Indiana', 'Pharmacy Service   Hospital/*statistics & numerical data']

Note, this only works because I'm assuming you don't have any escaped quotes in the string ... e.g. you never have:

'foo\'bar' which is a completely valid way to express strings in many programming situations. If you do have that situation, you'll need to use a more robust parser -- e.g. pyparsing:

>>> import pyparsing as pp
>>> [x[0][0].strip("'") for x in pp.sglQuotedString.scanString(s)]
['Adult', 'Adverse Drug Reaction Reporting Systems/*classification', '*Drug-Related Side Effects and Adverse Reactions', 'Hospital Bed Capacity   300 to 499', 'Hospitals   County', 'Humans', 'Indiana', 'Pharmacy Service   Hospital/*statistics & numerical data']
>>> s2 = r"'foo\'bar' 'baz'"
>>> [x[0][0].strip("'") for x in pp.sglQuotedString.scanString(s2)]
["foo\\'bar", 'baz']

Upvotes: 1

Related Questions