elekgeek
elekgeek

Reputation: 43

Python using dictionary for multiple RegEX re.sub

I am trying to manipulate a text from SNMP sysDescr.0 output with Python 3, I need to use a dictionary that contains the patterns and their replacements as follows:

myDict = {
    r' \(\/sw.+': '',
    r' \(\/ws.+$': '',
    r'Compiled on.{36}': '',
    r'Ruckus Wireless, Inc. ': '',
    r'Brocade Communications Systems, Inc. ': '',
    r' Switch': '',
    r', ROM': ' - ROM',
    r' revision': 'revision',
    r' IronWare': 'IronWare'
}

I found belowcode here but the first three patterns in the dictionary are not working, while the rest are OK, I don't know why:

def multiple_replace(myDict, text):
    regex = re.compile(r'(%s)' % '|'.join(map(re.escape, myDict.keys())))
    return regex.sub(lambda mo: myDict.get(mo.group(), mo.group()),text)

How can I modify the above function to be able to correctly run RegEX for the first three patterns? I tried most similar solutions here but non of them was able to handle the first three patterns.

My simple version is below, but I am really interested to see how the first solution should be modified to work correctly as I am new to python anyway:

def multiple_replace(myDict, text):
    for key, val in myDict.items():
        if re.search(key, text):
            text = re.sub(key, val, text)
    return text  

Here is an example of the output:

HP J9856A 2530-24G-2SFP+ Switch, revision YA.16.05.0004, ROM YA.15.20 (/ws/swbuildm/rel_venice_qaoff/code/build/lakes(swbuildm_rel_venice_qaoff_rel_venice)) (Formerly ProCurve),.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A Switch 2610-48, revision R.11.122, ROM R.10.06 (/sw/code/build/nemo),.1.3.6.1.4.1.11.2.3.7.11.77
Ruckus Wireless, Inc. ICX7250-48-HPOE, IronWare Version 08.0.70aT211 Compiled on Jan 18 2018 at 04:21:25 labeled as SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

and what I need it to become:

HP J9856A 2530-24G-2SFP+,revision YA.16.05.0004 - ROM YA.15.20,.1.3.6.1.4.1.11.2.3.7.11.166
HP J9088A 2610-48,revision R.11.122 - ROM R.10.06,.1.3.6.1.4.1.11.2.3.7.11.77
ICX7250-48-HPOE,IronWare Version 08.0.70aT211 SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

Honestly I have no idea which is better or faster, your input is appreciated.

thanks

Upvotes: 2

Views: 2065

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You should make sure you make your keys are all compiled re objects, and once you have your regex-replacement dictionary ready, you will need to iterate over these key-value pairs and perform replacements one by one, with

for rx,repl in myDict.items():
        text = rx.sub(repl, text)

where rx will be the compiled re object and repl is the replacement string.

Full code snippet:

import re
myDict = {
    re.compile(r' \(\/sw.+\)'): '',
    re.compile(r' \(\/ws.+\)'): '',
    re.compile(r'Compiled on.{36}'): '',
    re.compile(re.escape(r'Ruckus Wireless, Inc. ')): '',
    re.compile(re.escape(r'Brocade Communications Systems, Inc. ')): '',
    re.compile(re.escape(r' Switch')): '',
    re.compile(re.escape(r', ROM')): ' - ROM',
    re.compile(re.escape(r' revision')): 'revision',
    re.compile(re.escape(r' IronWare')): 'IronWare'
}
s = """HP J9856A 2530-24G-2SFP+ Switch, revision YA.16.05.0004, ROM YA.15.20 (/ws/swbuildm/rel_venice_qaoff/code/build/lakes(swbuildm_rel_venice_qaoff_rel_venice)) (Formerly ProCurve),.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A Switch 2610-48, revision R.11.122, ROM R.10.06 (/sw/code/build/nemo),.1.3.6.1.4.1.11.2.3.7.11.77
Ruckus Wireless, Inc. ICX7250-48-HPOE, IronWare Version 08.0.70aT211 Compiled on Jan 18 2018 at 04:21:25 labeled as SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1"""

def multiple_replace(myDict, text):
    for rx,repl in myDict.items():
        text = rx.sub(repl, text)
    return text

print(multiple_replace(myDict, s))

See the Python demo.

Output:

HP J9856A 2530-24G-2SFP+,revision YA.16.05.0004 - ROM YA.15.20,.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A 2610-48,revision R.11.122 - ROM R.10.06,.1.3.6.1.4.1.11.2.3.7.11.77
ICX7250-48-HPOE,IronWare Version 08.0.70aT211 SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

Upvotes: 3

Related Questions