Reputation: 1
I have bunch of input files and need to replace few strings in them. First I created a dictionary using of key value pairs using regex. Dictionary contains key(string to be replaced) and value(replacement).
Example line in input file: Details of first student are FullName ="ABC XYZ KLM" FirstName ="ABC" ID = "123"
My dictionary would be ->
student = {
'ABC':'Student Firstname',
'ABC XYZ KLM':'Student Fullname',
'123':'Student ID'
}
I am using string replace() to do the replacement like this:
for line in inputfile1:
for src, dst in student.items():
line = line.replace(src,dst)
My output is coming as: Details of first student are FullName ="Student Firstname XYZ KLM" FirstName ="Student Firstname" ID = "Student ID"
What I am looking for is: Details of first student are FullName ="Student Fullname" FirstName ="Student Firstname" ID = "Student ID"
Can you please help me with figuring this out?
Upvotes: 0
Views: 81
Reputation: 14699
This is happening because the str.replace(..)
start by replacing the ABC
string first. You need to make sure that the longest pattern is replaced first.
To do that, you can follow one of these options:
Use an OrderedDict
dictionary instead and put the longest strings to be replace before the shortest:
In [3]: from collections import OrderedDict
In [6]: student = OrderedDict([('ABC XYZ KLM', 'Student Fullname'), ('ABC', 'Student Firstname'),('123', 'Student ID')])
In [7]: student.items()
Out[7]:
[('ABC XYZ KLM', 'Student Fullname'),
('ABC', 'Student Firstname'),
('123', 'Student ID')]
In [8]: line = 'FullName ="ABC XYZ KLM" FirstName ="ABC" ID = "123"'
In [9]: for src, dst in student.items():
...: line = line.replace(src, dst)
In [10]: line
Out[10]: 'FullName ="Student Fullname" FirstName ="Student Firstname" ID = "Student ID"'
The overall code looks like this:
from collections import OrderedDict
student = OrderedDict([('ABC XYZ KLM', 'Student Fullname'),
('ABC', 'Student Firstname'),
('123', 'Student ID')])
line = 'FullName ="ABC XYZ KLM" FirstName ="ABC" ID = "123"'
for src, dst in student.items():
line = line.replace(src, dst)
Also as suggested by @AlexHal in the comments below, you can simply use a list of tuples and sort it based on the longest pattern before replacement, the code will look like this:
In [2]: student = [('ABC', 'Student Firstname'),('123', 'Student ID'), ('ABC XYZ KLM', 'Student Fullname')]
In [3]: sorted(student, key=lambda x: len(x[0]), reverse=True)
Out[3]:
[('ABC XYZ KLM', 'Student Fullname'),
('ABC', 'Student Firstname'),
('123', 'Student ID')]
In [4]: sorted(student, key=lambda x: len(x[0]), reverse=True)
Out[4]:
[('ABC XYZ KLM', 'Student Fullname'),
('ABC', 'Student Firstname'),
('123', 'Student ID')]
In [9]: line = ' "Details of first student are FirstName ="ABC" FullName ="ABC XYZ KLM" ID = "123"'
In [10]: for src, dst in sorted(student, key=lambda x: len(x[0]), reverse=True):
...: line = line.replace(src, dst)
...:
In [11]: line
Out[11]: ' "Details of first student are FirstName ="Student Firstname" FullName ="Student Fullname" ID = "Student ID"'
In [12]:
Overall code:
student = [('ABC', 'Student Firstname'),
('123', 'Student ID'),
('ABC XYZ KLM', 'Student Fullname')]
line = ' "Details of first student are FirstName ="ABC" FullName ="ABC XYZ KLM" ID = "123"'
for src, dst in sorted(student, key=lambda x: len(x[0]), reverse=True):
line = line.replace(src, dst)
Upvotes: 1