SAI ANURAG DODDI
SAI ANURAG DODDI

Reputation: 45

Deleting a certain kind of list elements in python using regex

import re

def extraction(parentTag):
    should_retain = True
    for imageTag in parentTag:
        if re.search("^(\d+.+\d)",imageTag) and not re.search("^(\d+.+\d[-^]\w)",imageTag) and not re.search("^(\d+.+\d[-^]\d)",imageTag):
            should_retain = False
            break
    if should_retain:
        return parentTag
    return None
    
expected_input = [
    ['419adf7', '1.0.22-SNAPSSHOT'],
    ['1.0.24', '82e13c1', 'master'],
    ['1.0.25-1618314650'],
    ['1.0.10', '7ad4886'],
    ['1.0.13-1589279873', 'e597811'],
    ['73a3788'],
    
]
expected_input = list(filter(None,list(map(extraction, expected_input))))
print(expected_input)

Current Output = [['1.0.25-1618314650'], ['1.0.13-1589279873', 'e597811']]

Expected Output = [['1.0.25-1618314650'], ['1.0.13-1589279873', 'e597811'], ['419adf7', '1.0.22-SNAPSSHOT'], ['73a3788']]

And also is there any better way to write the code to get the Expected Output using regex.

Upvotes: 1

Views: 48

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You can use

import re
rx1 = re.compile(r'^\d+\.\d+\.\d+-\w')
rx2 = re.compile(r'^\d+\.\d+\.\d+$')
def extraction(parentTag):
    return [x for x in parentTag if any(rx1.match(e) for e in x) or not any(rx2.match(e) for e in x)]

expected_input = [
    ['419adf7', '1.0.22-SNAPSSHOT'],
    ['1.0.24', '82e13c1', 'master'],
    ['1.0.25-1618314650'],
    ['1.0.10', '7ad4886'],
    ['1.0.13-1589279873', 'e597811'],
    ['73a3788'],
]

expected_input = extraction(expected_input)
print(expected_input)

Output:

[['419adf7', '1.0.22-SNAPSSHOT'], ['1.0.25-1618314650'], ['1.0.13-1589279873', 'e597811'], ['73a3788']]

See the Python demo.

NOTE:

  • There are two regex checks: there must be at least one item in a list that matches ^\d+\.\d+\.\d+-\w (see any(rx1.match(e) for e in x)) or there must be no item that matches ^\d+\.\d+\.\d+$ pattern (see any(rx2.match(e) for e in x)).
  • With your code, you could not access the parent list because you mapped the list of lists, map(extraction, expected_input). You need to process the list of lists as an argument toextraction function.

Upvotes: 1

Simon Provost
Simon Provost

Reputation: 438

Concerning the last question: the refactoring of how complex your definition extraction function is. Here's an enhancement:

def extraction(parentTag):
    should_retain = not any(
        re.search("^(\d+.+\d)", imageTag)
        and not re.search("^(\d+.+\d[-^]\w)", imageTag)
        and not re.search("^(\d+.+\d[-^]\d)", imageTag)
        for imageTag in parentTag
    )
    if should_retain:
        return parentTag
    return None

Upvotes: 0

Related Questions