user11944367
user11944367

Reputation: 11

Simple regex extraction into multiple groups

Given this text

unit: 100 street: 200 city: 300

How do I write a regex that will give this output as different groups in JS. I also need to mention that the text may not contain street or city. It could just be unit: 100.

group[1] - 100
group[2] - 200
group[3] - 300

Where I'm at so far - unit:\s(.*?)\s(?:.*|$)

Not sure how to proceed further and get all 3 groups in one regex!

Upvotes: 0

Views: 54

Answers (4)

The fourth bird
The fourth bird

Reputation: 163477

You can use a single capture group:

[^\s:]+:[^\S\n]*(\S+)

Regex demo

const regex = /[^\s:]+:[^\S\n]*(\S+)/g;
[
  `unit: 100 street: 200 city: 300`,
  `unit: 100`
].forEach(s =>
  console.log(
    Array.from(s.matchAll(regex), m => m[1])
  )
);

Or you can use optional nested groups, but the pattern will be longer if you want to use more groups:

[^\s:]+:[^\S\n]*(\S+)(?:[^\S\n]+[^\s:]+:[^\S\n]*(\S+)(?:[^\S\n]+[^\s:]+:[^\S\n]*(\S+))?)?

Regex demo

Upvotes: 1

zer00ne
zer00ne

Reputation: 44068

const rgx = /\D*(\d)(\d)+/g

Regex101

Segment Description
\D*
Zero or more non-digits
(\d)
First capture group of a digit
(\d)+
Second capture group of one or more digits
g
global flag

.replace() with

"group[$1] - $1$2$2\n"
Segment Description
group[$1]
Replace with literal: group[, then the first capture group of (\d), and then a literal: ]
 - 
Then a space, a literal hyphen -, and a space
$1$2$2\n
Next, the first capture group again: (\d), then the second capture group: (\d)+ twice, and finally a newline

Example A

const str = `unit: 100 street: 200 city: 300`;
const rgx = /\D*(\d)(\d)+/g;

const res = str.replace(rgx, "group[$1] - $1$2$2\n");
console.log(res);

Note: The following comment was not considered in Example A:

It need not be numbers, could be any text."

The question should have this criteria added and an appropriate example:

'unit: 100 street: Main city: Springfield'

Because of this part: group[?], more than one method is needed. See Example B for a solution.

Example B

const str = 'unit: 100 street: Main city: Springfield';
const rgx = /\b(\w+:) ([\w]+)/g;

const res = str.replace(rgx, "$2")
            .split(' ')
            .map((s, i) => 
              "group["+(i + 1)+"] - "+s).join('\n');

console.log(res);

Upvotes: 0

anubhava
anubhava

Reputation: 785721

You may do this using split as well:

const s = 'unit: 100 street: 200 city: 300';

var arr = s.split(/\s*\w+:\s*/).filter(Boolean);

console.log( arr );

//=> ["100", "200", "300"]

Here \s*\w+:\s* matches 0 or more spaces, followed by 1+ word characters then : and 0 or more whitespaces.

Note the filter(Boolean) is just used to remove empty elements from resulting array.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522331

We can use match() here:

var input = "unit: 100 street: 200 city: 300";
var matches = input.match(/(?<=: ).*?(?=\s*\w+:|$)/g);
console.log(matches);

Upvotes: 0

Related Questions