user4002542
user4002542

Reputation:

Backward capture group concatenated with forward capture group

I think the title says it all. I'm trying to get groups and concatenate them together.

I have this text:

GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48

And I want this output:

IDENTIFIER 10.802.123/3843-48

So I want to explicitly say, I want to capture one group before this word and after, then concatenate both, only using regex. Is this possible?

I can already extract the 48 like this:

var text = GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48
var reg = new RegExp('IDENTIFIER' + '.*?(\\d\\S*)', 'i');
var match = reg.exec(text);

Output:

48

Can it be done?

I'm offering 200 points.

Upvotes: 4

Views: 125

Answers (5)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

You can use split too:

var text = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48';

var parts = text.split(/\s+/);

if (parts[4] == 'IDENTIFIER') {
    var result = parts[4] + ' ' + parts[1] + '-' + parts[5];
    console.log(result);
} 

Upvotes: 0

vks
vks

Reputation: 67968

^\s*\S+\s*\b(\d+(?:[./]\d+)+)\b.*?-.*?\b(\S+)\b\s*(\d+)\s*$

You can try this.Replace by $2 $1-$3.See demo.

https://regex101.com/r/sS2dM8/38

var re = /^\s*\S+\s*\b(\d+(?:[.\/]\d+)+)\b.*?-.*?\b(\S+)\b\s*(\d+)\s*$/gm; 
var str = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48';
var subst = '$2 $1-$3'; 

var result = str.replace(re, subst);

Upvotes: 1

Michael Laszlo
Michael Laszlo

Reputation: 12239

You must precisely define the groups that you want to extract before and after the word. If you define the group before the word as four or more non-whitespace characters, and the group after the word as one or more non-whitespace characters, you can use the following regular expression.

var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'i');
var groups = re.exec(text);
if (groups !== null) {
   var result = groups[1] + groups[2];
}

Let me break down the regular expression. Note that we have to escape the backslashes because we're writing a regular expression inside a string.

  • (\\S{4,}) captures a group of four or more non-whitespace characters
  • \\s+ matches one or more whitespace characters
  • (?: indicates the start of a non-capturing group
  • \\S{1,3} matches one to three non-whitespace characters
  • \\s+ matches one or more whitespace characters
  • )*? makes the non-capturing group match zero or more times, as few times as possible
  • word matches whatever was in the variable word when the regular expression was compiled
  • .*? matches any character zero or more times, as few times as possible
  • (\\S+) captures one or more non-whitespace characters
  • the 'i' flag makes this a case-insensitive regular expression

Observe that our use of the ? modifier allows us to capture the nearest groups before and after the word.

You can match the regular expression globally in the text by adding the g flag. The snippet below demonstrates how to extract all matches.

function forward_and_backward(word, text) {
  var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'ig');
  // Find all matches and make an array of results.
  var results = [];
  while (true) {
    var groups = re.exec(text);
    if (groups === null) {
      return results;
    }
    var result = groups[1] + groups[2];
    results.push(result);
  }
}

var sampleText = "  GPX 10.802.123/3843- 1 -- IDENTIFIER 48   A BC 444.2345.1.1/99x 28 - - Identifier 580 X Y Z 9.22.16.1043/73+ 0  ***  identifier 6800";

results = forward_and_backward('IDENTIFIER', sampleText);
for (var i = 0; i < results.length; ++i) { 
  document.write('result ' + i + ': "' + results[i] + '"<br><br>');
}
body {
  font-family: monospace;
}

Upvotes: 3

Avinash Raj
Avinash Raj

Reputation: 174696

This would be possible through replace function.

var s = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48'
s.replace(/.*?(\S+)\s+\d+\s*-\s*(IDENTIFIER)\s*(\d+).*/, "$2 $1-$3")

Upvotes: 1

anubhava
anubhava

Reputation: 784958

You can do:

var text = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48';
var match = /GPX\s+(.+?) \d .*?(IDENTIFIER).*?(\d\S*)/i.exec(text);

var output = match[2] + ' ' + match[1] + '-' + match[3];
//=> "IDENTIFIER 10.802.123/3843­-48"

Upvotes: 3

Related Questions