Reputation: 767
I want to extract variables from following string (i.e. names surrounded by ' ')
Case1:
string = r"RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="No"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))"
when I apply
all_variables = list(set(re.findall("'([^']*)'", string)))
I get correct results :
all_variables = ['Loss_Ratio','POL_Zero','POL_children']
But Case 2 (when POL_Zero modality changed)
string = r"RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="Nos' conditional"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))"
The same regex produces wrong result. How can I still obtain correct result in case2 as well?
Note that there can be no single or double quotes inside the names.
Upvotes: 1
Views: 79
Reputation: 627537
You may leverage the fact that your single quoted strings cannot contain neither single nor double quotes.
Only in this situtation,
"""'([^"']*)'"""
regex will work as expected. See the regex demo.
Here,
'
- matches a single quote([^"']*)
- Group 1 (if you are using re.findall', only this part will be present in the output): zero or more (
*) chars other than
"and
'(
[^'"]`)'
- closing single quote.import re
s = """RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="No"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))
RESPONSE(1, -2.532 + 0.779*(LN('Loss_Ratio')) +SELECT(INDEX_FIRST_TRUE('POL_Zero'="Nos' conditional"),2.261,0.0) +SELECT(INDEX_FIRST_TRUE('POL_children'="Si"),0.307,0.0))"""
print(re.findall(r"""'([^"']*)'""", s))
# => ['Loss_Ratio', 'POL_Zero', 'POL_children', 'Loss_Ratio', 'POL_Zero', 'POL_children']
Upvotes: 1