Reputation: 450
Given a python list split values based on certain criteria:
list = ['(( value(name) = literal(luke) or value(like) = literal(music) )
and (value(PRICELIST) in propval(valid))',
'(( value(sam) = literal(abc) or value(like) = literal(music) ) and
(value(PRICELIST) in propval(valid))']
Now list[0] would be
(( value(name) = literal(luke) or value(like) = literal(music) )
and (value(PRICELIST) in propval(valid))
I want to split such that upon iterating it would give me:
#expected output
value(sam) = literal(abc)
value(like) = literal(music)
That too if it starts with value and literal. At first I thought of splitting with and ,or but it won't work as sometimes there could be missing and ,or.
I tried :
for i in list:
i.split()
print(i)
#output ['((', 'value(abc)', '=', 'literal(12)', 'or' ....
I am open to suggestions based on regex also. But I have little idea about it I prefer not to include it
Upvotes: 1
Views: 523
Reputation: 43169
Not saying you should but you definately could use a PEG
parser here:
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
data = ['(( value(name) = literal(luke) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))',
'(( value(sam) = literal(abc) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))']
grammar = Grammar(
r"""
expr = term (operator term)*
term = lpar* factor (operator needle)* rpar*
factor = needle operator needle
needle = word lpar word rpar
operator = ws? ("=" / "or" / "and" / "in") ws?
word = ~"\w+"
lpar = "(" ws?
rpar = ws? ")"
ws = ~r"\s*"
"""
)
class HorribleStuff(NodeVisitor):
def generic_visit(self, node, visited_children):
return node.text or visited_children
def visit_factor(self, node, children):
output, equal = [], False
for child in node.children:
if (child.expr.name == 'needle'):
output.append(child.text)
elif (child.expr.name == 'operator' and child.text.strip() == '='):
equal = True
if equal:
print(output)
for d in data:
tree = grammar.parse(d)
hs = HorribleStuff()
hs.visit(tree)
This yields
['value(name)', 'literal(luke)']
['value(sam)', 'literal(abc)']
Upvotes: 0
Reputation: 482
So to avoid so much clutter, I'm going to explain the solution in this comment. I hope that's okay.
Given your comment above which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:
>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) )
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']
>>> found_list = []
>>> for item in list:
for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
found_list.append(element)
>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']
Explanation:
[a-zA-Z0-9\._]+
to [\w\.]+
because they mean essentially the same thing but one is more concise. I explain what characters are covered by those queries in the next step([\w\.]+
, noting that it is "unclosed" meaning I am priming the regex to capture everything in the following query, I am telling it to begin by capturing all characters that are in the range a-z
, A-Z
, and _
, and an escaped period (.
)(?:\()
I am saying the captured query should contain an escaped "opening" parenthesis ((
)[\w\.]+(?:\))
I'm saying follow that parenthesie again with the word characters outlined in the second step, but this time through (?:\))
I'm saying follow them with an escaped "closing" parenthesis ()
)[\s=<>(?:in)]+
is kind of reckless but for the sake of readability and assuming that your strings will remain relatively consistent this says, that the "closing parenthesis" should be followed by "whitespace"
, a =
, a <
, a >
, or the word in
, in any order however many times they all occur consistently. It is reckless because it will also match things like << <
, = in > =
, etc. To make it more specific could easily result in a loss of captures though[\w\.]+(?:\()[\w\.]+(?:\))
I'm saying once again, find the word characters from step 1, followed by an "opening parenthesis," followed again by the word characters, followed by a "closing parenthesis")
I am closing the "unclosed" capture group (remember the first capture group above started as "unclosed"), to tell the regex engine to capture the entire query I have outlinedHope this helps
Upvotes: 1
Reputation: 482
@Duck_dragon
Your strings in your list in the opening post were formatted in such a way that they cause a syntax error in Python. In the example I give below, I edited it to use '''
>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) )
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and
(value(PRICELIST) in propval(valid))''']
#Simple findall without setting it equal to a variable so it returns a list of separate strings but which you can't use
#You can also use the *MORE SIMPLE* but less flexible regex: '([a-zA-Z]+\([a-zA-Z]+\)[\s=]+[a-zA-Z]+\([a-zA-Z]+\))'
>>> for item in list:
re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item)
['value(name) = literal(luke)', 'value(like) = literal(music)']
['value(sam) = literal(abc)', 'value(like) = literal(music)']
.
To take this a step further and give you an array you can work with:
>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) )
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and
(value(PRICELIST) in propval(valid))''']
#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
found_list.append(element)
>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(sam) = literal(abc)', 'value(like) = literal(music)']
.
Given your comment below which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:
>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) )
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']
>>> found_list = []
>>> for item in list:
for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
found_list.append(element)
>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']
.
Edit: Or is this what you want?
>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) )
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and
(value(PRICELIST) in propval(valid))''']
#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=<>(?:in)]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
found_list.append(element)
>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)']
Let me know if you need an explanation.
.
@Fyodor Kutsepin
In your example take out your_list_
and replace it with OP's list
to avoid confusion. Secondly, your for loop
lacks a :
producing syntax errors
Upvotes: 1
Reputation: 13
First, I would suggest you to avoid of naming your variables like build-in functions. Second, you don't need a regex if you want to get the mentioned output.
for example:
first, rest = your_list_[1].split(') and'):
for item in first[2:].split('or')
print(item)
Upvotes: 0