Andrew Downes
Andrew Downes

Reputation: 1088

Find differences between two strings with regex

I have been provided with two example input strings:

"Russia has entered the WWII in [A] [B] after german invasion"

"Russia has entered the WWII in September 1941 after german invasion"

There can be any characters before, after and between the [A] and [B] in the first string and there could be additional placeholders e.g. [C] [D] etc. Each placeholder can only occur once.

How can I use regex to match "September" and "1941"?

I need to match each placeholder in a single regex, not multiple steps.

My thoughts at a solution

I'm guessing the solution will be something like:

'Match everything in string 2 after everything before [A] in string 1 and before everything after [A] in string 1'.

I figured out (.*(:?\[A\])) and ((:?\[A\]).*) to get the text before and after the [A] in the first string, but can't figure out how to use that to look at the second string. Perhaps I need to concatenate the two things with some sort of delimiter and look at either side of the delimiter?

Upvotes: 1

Views: 2956

Answers (1)

ssc-hrep3
ssc-hrep3

Reputation: 16089

If I understood your question correctly, you would like match the fragments around [A] and [B] to search in the second term for their respective values. You can do this in two steps. First, you need to extract the terms around the [A] and [B]. This can be done with the following regular expression: ^(.*?)(\[A\])(.*?)(\[B\])(.*?)$. In a second step, you need to create a new regular expression out of the result of the first one. The three matched groups (the values in the round brackets form a group) would then be the fragments around the terms [A] and [B]. You then need to create a new regular expression out of those three fragments. Here, the implementation differs for every programming language. In JavaScript, the matching object can be used to create a new regular expression like this: new RegExp(matches1[1] + '(.*?)' + matches1[2] + '(.*?)' + matches1[3]). Finally, you end up with the match of the two values.

Here, the example is implemented in JavaScript:

var text1 = "Russia has entered the WWII in [A] [B] after german invasion";
var regex1 = new RegExp(/^(.*?)\[A\](.*?)\[B\](.*?)$/);
var matches1 = text1.match(regex1);

var text2 = "Russia has entered the WWII in September 1941 after german invasion";
var regex2 = new RegExp(matches1[1] + '(.*?)' + matches1[2] + '(.*?)' + matches1[3]);
var matches2 = text2.match(regex2);

console.log(matches2[1]);
console.log(matches2[2]);

Upvotes: 1

Related Questions