Akhil Garg
Akhil Garg

Reputation: 69

How to find string template from formatted strings?

Suppose I have a string template, e.g.,

string="This is a {object}"

Now i create two(or more) strings by formatting this string, i.e.,

string.format(object="car")
=>"This is a car"

string.format(object="2020-06-05 16:06:30")
=>"This is a 2020-06-05 16:06:30"

Now I have lost the original string somehow. Is there a way to find out the original string using the 2 new strings that I have now?

Note: I have a data set of these strings which were created from a template but the original template was lost because of editing. New strings were created from the new template and put in the same data set. I have tried using some ML based approach but it doesn't seem to work in general case. I am looking for an algorithm that gives me back the original string, it could be one or a group a strings in case the template has been changed multiple times.

Upvotes: 5

Views: 991

Answers (3)

Ajax1234
Ajax1234

Reputation: 71451

A possibility could be to match the words and formatted value options in the input strings and then compare:

import re
def get_vals(s):
   return re.findall('[\d\-]+\s[\d:]+|\w+', s)

vals = ["This is a car", "This is a 2020-06-05 16:06:30"]
r = ' '.join('{object}' if len(set(i)) > 1 else i[0] for i in zip(*map(get_vals, vals)))

Output:

'This is a {object}'

Upvotes: 2

Finn
Finn

Reputation: 2089

You could use one of the many "sequence alignment" algorithms used mostly to align DNA sequences. This will return sequences of the string which are conserved. Then you would keep the conserved areas and add in placeholders where "mutation" happened to get the templates.

https://en.wikipedia.org/wiki/Multiple_sequence_alignment will get you started.

Upvotes: 0

Andriy Ivaneyko
Andriy Ivaneyko

Reputation: 22021

You can find place of template, but won't be able to understand the names in template, so by getting difference between two strings you can understand place of templated strings.

Take a look on Python - getting just the difference between strings for suggestion of how to get difference between two strings.

Below some steps which may serve you as starting point:

  1. Get difference between strings A and B as list, collect only strings from A.
  2. Initialize template = A
  3. Iterate over different strings and replace them in template to {}

At the and you will have will have template string from A.

Upvotes: 0

Related Questions