Data manipulation via regex in python for removal/editing of certain data in parentheses.

Question

I am having a little bit of an issue with my data manipulation below... this is example code, normally each line in datas will always appear under the variable: "data"

import re

datas = """Class (EN)
    Class (NA)
    CLASS (AA)
    CLASS-TWO (AA)
    Class3-A-H (NO)"""

datas = datas.split("\n")

for data in datas:
    data = data.strip()
    data = re.sub(r'\s*$\w+$\s*$', '', data)
    print data

If you run the above code the school classes are returned without the class code (the bracketed part)

However, I have a few variations which require different handling...

Example: CLASS (NA) (N/A) should be returned: CLASS (N/A)

Example#2: CLASS (NA) (BB) should be returned: CLASS (B/B) (BB) is the only one what should never get removed but instead changed to (B/B)

For example the following data:

CLASS (EN)
CLASS (NA) (BB)
CLASS (AA) (N/A)
CLASS (N/A)
CLASS (BB)

Should return:

CLASS
CLASS (B/B)
CLASS (N/A)
CLASS (N/A)
CLASS (B/B)

I think this is fairly complicated and I've tried a fair few things but I honestly struggle with the regex parts

Thanks in advance - Hyflex

abarnert · Accepted Answer

The easy way to do this is in two steps.

First, sub each (BB) to (B/B) (which you can even do with str.replace instead of re.sub if you want).

Then, since (B/B) no longer matches the pattern, your existing code already does the right thing.

So:

data = re.sub(r'$BB$', '(B/B)', data)
data = re.sub(r'\s*$\w+$\s*$', '', data)

Data manipulation via regex in python for removal/editing of certain data in parentheses.

Answers (2)

Related Questions