Duck_dragon
Duck_dragon

Reputation: 450

Python split list values based on condition

Given a python list split values based on certain criteria:

    list = ['(( value(name) = literal(luke) or value(like) = literal(music) ) 
     and (value(PRICELIST) in propval(valid))',
    '(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
     (value(PRICELIST) in propval(valid))'] 

Now list[0] would be

  (( value(name) = literal(luke) or value(like) = literal(music) ) 
     and (value(PRICELIST) in propval(valid))

I want to split such that upon iterating it would give me:

#expected output
value(sam) = literal(abc)
value(like) = literal(music)

That too if it starts with value and literal. At first I thought of splitting with and ,or but it won't work as sometimes there could be missing and ,or.

I tried :

for i in list:
i.split()
print(i)
#output ['((', 'value(abc)', '=', 'literal(12)', 'or' .... 

I am open to suggestions based on regex also. But I have little idea about it I prefer not to include it

Upvotes: 1

Views: 523

Answers (4)

Jan
Jan

Reputation: 43169

Not saying you should but you definately could use a PEG parser here:

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

data = ['(( value(name) = literal(luke) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))',
        '(( value(sam) = literal(abc) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))']

grammar = Grammar(
    r"""
    expr        = term (operator term)*
    term        = lpar* factor (operator needle)* rpar*
    factor      = needle operator needle

    needle      = word lpar word rpar

    operator    = ws? ("=" / "or" / "and" / "in") ws?
    word        = ~"\w+"

    lpar        = "(" ws?
    rpar        = ws? ")"
    ws          = ~r"\s*"
    """
)

class HorribleStuff(NodeVisitor):
    def generic_visit(self, node, visited_children):
        return node.text or visited_children

    def visit_factor(self, node, children):
        output, equal = [], False

        for child in node.children:
            if (child.expr.name == 'needle'):
                output.append(child.text)
            elif (child.expr.name == 'operator' and child.text.strip() == '='):
                equal = True

        if equal:
            print(output)

for d in data:
    tree = grammar.parse(d)
    hs = HorribleStuff()
    hs.visit(tree)

This yields

['value(name)', 'literal(luke)']
['value(sam)', 'literal(abc)']

Upvotes: 0

FailSafe
FailSafe

Reputation: 482

So to avoid so much clutter, I'm going to explain the solution in this comment. I hope that's okay.

Given your comment above which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']


>>> found_list = []
>>> for item in list:
        for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
            found_list.append(element)

>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']

Explanation:

  • Pre-Note - I changed [a-zA-Z0-9\._]+ to [\w\.]+ because they mean essentially the same thing but one is more concise. I explain what characters are covered by those queries in the next step
  • With ([\w\.]+, noting that it is "unclosed" meaning I am priming the regex to capture everything in the following query, I am telling it to begin by capturing all characters that are in the range a-z, A-Z, and _, and an escaped period (.)
  • With (?:\() I am saying the captured query should contain an escaped "opening" parenthesis (()
  • With [\w\.]+(?:\)) I'm saying follow that parenthesie again with the word characters outlined in the second step, but this time through (?:\)) I'm saying follow them with an escaped "closing" parenthesis ())
  • This [\s=<>(?:in)]+ is kind of reckless but for the sake of readability and assuming that your strings will remain relatively consistent this says, that the "closing parenthesis" should be followed by "whitespace", a =, a <, a >, or the word in, in any order however many times they all occur consistently. It is reckless because it will also match things like << <, = in > =, etc. To make it more specific could easily result in a loss of captures though
  • With [\w\.]+(?:\()[\w\.]+(?:\)) I'm saying once again, find the word characters from step 1, followed by an "opening parenthesis," followed again by the word characters, followed by a "closing parenthesis"
  • With the ) I am closing the "unclosed" capture group (remember the first capture group above started as "unclosed"), to tell the regex engine to capture the entire query I have outlined

Hope this helps

Upvotes: 1

FailSafe
FailSafe

Reputation: 482

@Duck_dragon

Your strings in your list in the opening post were formatted in such a way that they cause a syntax error in Python. In the example I give below, I edited it to use '''

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Simple findall without setting it equal to a variable so it returns a list of separate strings but which you can't use
#You can also use the *MORE SIMPLE* but less flexible regex:  '([a-zA-Z]+\([a-zA-Z]+\)[\s=]+[a-zA-Z]+\([a-zA-Z]+\))'
>>> for item in list:
        re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item)    

    ['value(name) = literal(luke)', 'value(like) = literal(music)']
    ['value(sam) = literal(abc)', 'value(like) = literal(music)']

.

To take this a step further and give you an array you can work with:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
        for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
            found_list.append(element)


>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(sam) = literal(abc)', 'value(like) = literal(music)']

.

Given your comment below which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']


>>> found_list = []
>>> for item in list:
        for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
            found_list.append(element)

>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']

.

Edit: Or is this what you want?

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
        for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=<>(?:in)]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
            found_list.append(element)


>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)']

Let me know if you need an explanation.

.

@Fyodor Kutsepin

In your example take out your_list_ and replace it with OP's list to avoid confusion. Secondly, your for loop lacks a : producing syntax errors

Upvotes: 1

Oddy
Oddy

Reputation: 13

First, I would suggest you to avoid of naming your variables like build-in functions. Second, you don't need a regex if you want to get the mentioned output.

for example:

first, rest = your_list_[1].split(') and'):
for item in first[2:].split('or')
    print(item)

Upvotes: 0

Related Questions