Reputation: 5769
I am having a little bit of an issue with my data manipulation below... this is example code, normally each line in datas will always appear under the variable: "data"
import re
datas = """Class (EN)
Class (NA)
CLASS (AA)
CLASS-TWO (AA)
Class3-A-H (NO)"""
datas = datas.split("\n")
for data in datas:
data = data.strip()
data = re.sub(r'\s*\(\w+\)\s*$', '', data)
print data
If you run the above code the school classes are returned without the class code (the bracketed part)
However, I have a few variations which require different handling...
Example: CLASS (NA) (N/A)
should be returned: CLASS (N/A)
Example#2: CLASS (NA) (BB)
should be returned: CLASS (B/B)
(BB) is the only one what should never get removed but instead changed to (B/B)
For example the following data:
CLASS (EN)
CLASS (NA) (BB)
CLASS (AA) (N/A)
CLASS (N/A)
CLASS (BB)
Should return:
CLASS
CLASS (B/B)
CLASS (N/A)
CLASS (N/A)
CLASS (B/B)
I think this is fairly complicated and I've tried a fair few things but I honestly struggle with the regex parts
Thanks in advance - Hyflex
Upvotes: 0
Views: 91
Reputation: 2768
how about this one?
import re
datas = """Class (EN)(EL)
Class (NA)
CLASS (AA)
CLASS-TWO (AA)
Class3-A-H (NO)"""
datas = datas.split("\n")
for data in datas:
data = data.strip()
data = re.sub(r'^([^ ]+?) +.*\((.)/?(.)\) *$', r'\1 (\2/\3)', data)
print data
outcome same as question gives:
Class (E/L)
Class (N/A)
CLASS (A/A)
CLASS-TWO (A/A)
Class3-A-H (N/O)
Upvotes: 2
Reputation: 365677
The easy way to do this is in two steps.
First, sub each (BB)
to (B/B)
(which you can even do with str.replace
instead of re.sub
if you want).
Then, since (B/B)
no longer matches the pattern, your existing code already does the right thing.
So:
data = re.sub(r'\(BB\)', '(B/B)', data)
data = re.sub(r'\s*\(\w+\)\s*$', '', data)
Upvotes: 4