Reputation: 69
Suppose I have a string template, e.g.,
string="This is a {object}"
Now i create two(or more) strings by formatting this string, i.e.,
string.format(object="car")
=>"This is a car"
string.format(object="2020-06-05 16:06:30")
=>"This is a 2020-06-05 16:06:30"
Now I have lost the original string somehow. Is there a way to find out the original string using the 2 new strings that I have now?
Note: I have a data set of these strings which were created from a template but the original template was lost because of editing. New strings were created from the new template and put in the same data set. I have tried using some ML based approach but it doesn't seem to work in general case. I am looking for an algorithm that gives me back the original string, it could be one or a group a strings in case the template has been changed multiple times.
Upvotes: 5
Views: 991
Reputation: 71451
A possibility could be to match the words and formatted value options in the input strings and then compare:
import re
def get_vals(s):
return re.findall('[\d\-]+\s[\d:]+|\w+', s)
vals = ["This is a car", "This is a 2020-06-05 16:06:30"]
r = ' '.join('{object}' if len(set(i)) > 1 else i[0] for i in zip(*map(get_vals, vals)))
Output:
'This is a {object}'
Upvotes: 2
Reputation: 2089
You could use one of the many "sequence alignment" algorithms used mostly to align DNA sequences. This will return sequences of the string which are conserved. Then you would keep the conserved areas and add in placeholders where "mutation" happened to get the templates.
https://en.wikipedia.org/wiki/Multiple_sequence_alignment will get you started.
Upvotes: 0
Reputation: 22021
You can find place of template, but won't be able to understand the names in template, so by getting difference between two strings you can understand place of templated strings.
Take a look on Python - getting just the difference between strings for suggestion of how to get difference between two strings.
Below some steps which may serve you as starting point:
{}
At the and you will have will have template string from A.
Upvotes: 0