Reputation: 87
I am trying to use re.sub to replace all the different combinations of B2Ab in this file with for this example B2Az. The find part is working perfectly, but every time I do a substitution it substitutes the raw regular expression. I am obviously missing something very simple but cannot find out what it is.
This is a snippet of the file I am parsing:
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="GENB2AbE" t:S="1" t:SC="5">
<t:ION t:SA="RegisterObj" t:H="3978" t:P="2058" t:N="RE11 Data Log" t:CI="Log_Register" t:L="GENB2Ab">
<t:ION t:SA="ModuleObj" t:H="2059" t:P="132" t:N="Data Rec 12" t:CI="DataRecorder_Module" t:L="B2Ab2SBrkr" t:S="1" t:SC="5">
<t:IH>
<t:CH t:H="43715">
</t:CH>
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="PMDC_B2_A_b_E" t:S="1" t:SC="5">
<t:IH>
<t:CH t:H="43715">
</t:CH>
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="PMDC-B2-A-b_E" t:S="1" t:SC="5">
This is the latest iteration of my numerous attempts to get this to work:
var1 = 'B'
var2 = '2'
var3 = 'A'
var4 = 'b'
var5 = 'z'
fname = 'example.txt'
regexfind(fname,var1,var2,var3,var4,var5)
def regexfind(filename,varl1,varl2,varl3,varl4,varlr,):
pattern = re.compile('([_-]?['+varl1+'][_-]?['+varl2+'][_-]?['+varl3+'][_-]?['+varl4+'][_-]?)')
repl = re.compile('([_-]?['+varl1+'][_-]?['+varl2+'][_-]?['+varl3+'][_-]?['+varlr+'][_-]?)')
f = open(filename,'rb')
searchstrs = f.readlines()
i=0
for line in searchstrs:
for match in re.finditer(pattern, line):
print 'Found on line %s: %s' % (i+1, match.groups())
#Showing decompiled patterns for ease of explanation
line = re.sub(r'[_-]?['+varl1+'][_-]?['+varl2+'][_-]?['+varl3+'][_-]?['+varl4+'][_-]?',\
r'[_-]?['+varl1+'][_-]?['+varl2+'][_-]?['+varl3+'][_-]?['+varlr+'][_-]?', line.rstrip())
print(line)
This is what I am getting for output from the above code:
Found on line 1: ('B2Ab',)
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="GEN[_-]?[B][_-]?[2][_-]?[A][_-]?[z][_-]?E" t:S="1" t:SC="5">
Found on line 1: ('B2Ab',)
<t:ION t:SA="RegisterObj" t:H="3978" t:P="2058" t:N="RE11 Data Log" t:CI="Log_Register" t:L="GEN[_-]?[B][_-]?[2][_-]?[A][_-]?[z][_-]?">
Found on line 1: ('B2Ab',)
<t:ION t:SA="ModuleObj" t:H="2059" t:P="132" t:N="Data Rec 12" t:CI="DataRecorder_Module" t:L="[_-]?[B][_-]?[2][_-]?[A][_-]?[z][_-]?2SBrkr" t:S="1" t:SC="5">
Found on line 1: ('_B2_A_b_',)
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="PMDC[_-]?[B][_-]?[2][_-]?[A][_-]?[z][_-]?E" t:S="1" t:SC="5">
Found on line 1: ('-B2-A-b_',)
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="PMDC[_-]?[B][_-]?[2][_-]?[A][_-]?[z][_-]?E" t:S="1" t:SC="5">
I should be getting this:
Found on line 1: ('B2Ab',)
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="GENB2AzE" t:S="1" t:SC="5">
Found on line 1: ('B2Ab',)
<t:ION t:SA="RegisterObj" t:H="3978" t:P="2058" t:N="RE11 Data Log" t:CI="Log_Register" t:L="GENB2Az">
Found on line 1: ('B2Ab',)
<t:ION t:SA="ModuleObj" t:H="2059" t:P="132" t:N="Data Rec 12" t:CI="DataRecorder_Module" t:L="B2Az2SBrkr" t:S="1" t:SC="5">
Found on line 1: ('_B2_A_b_',)
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="PMDC_B2_A_z_E" t:S="1" t:SC="5">
Found on line 1: ('-B2-A-b_',)
<t:ION t:SA="ModuleObj" t:H="2058" t:P="132" t:N="Data Rec 11" t:CI="DataRecorder_Module" t:L="PMDC_B2_A_z_E" t:S="1" t:SC="5">
Thanks in advance for any suggestions as to what I am missing here.
Upvotes: 0
Views: 51
Reputation: 67978
You need to define a replacement function of your own.
def repl(matchobj):
return varl1+varl2+varl3+valr4
line = re.sub(r'[_-]?['+varl1+'][_-]?['+varl2+'][_-]?['+varl3+'][_-]?['+varl4+'][_-]?',repl, line.rstrip())
Something of this sort.In function you can return anything you want.
Upvotes: 0
Reputation: 198436
The replacement is not a pattern. Its only special feature is that it replaces sequences \1
, \2
etc. with capture group contents. So you can't match underscores and dashes in the replacement part; you want to copy them from the pattern.
pattern = re.compile('([_-]?'+varl1+'[_-]?'+varl2+'[_-]?'+varl3+'[_-]?)'+varl4+'([_-]?)')
re.sub(pattern, r'\1' + varlr + r'\2', line.rstrip())
On e.g. "PMDC_B2_A_b_E"
, this will capture "_B2_A_
and "_"
into \1
and \2
respectively, then restore them back in the replacement, with z
sandwiched in between, for final _B2_A_z_
replacement, making the final string "PMDC_B2_A_z_E"
.
Upvotes: 2