tnabdb
tnabdb

Reputation: 547

How to repeatedly capture a group?

Suppose I have the following string

Bimkingo Clasico Prom 135g LON 49835 Gansito ME 5p 250g MTA MLA 49860 Wonder

I want to extract only tokens that don't contain numbers or only upper case letters.

The output should be

Bimkingo Clasico Prom Gansito Wonder

This doesn't seem to work: \b(([a-zA-Z]+)+)\b.

Upvotes: 2

Views: 67

Answers (2)

m-a-r-c-e-l-i-n-o
m-a-r-c-e-l-i-n-o

Reputation: 2672

The following regex (\b[a-zA-Z]+[a-z]+\b) should work as expected, for the output example in OP's post and other edge cases:

var string = 'Bimkingo Clasico Prom 135g LON 49835 Gansito ME 5p 250g MTA MLA 49860 Wonder';
var regexp = /(\b[a-zA-Z]+[a-z]+\b)+/g;
var matches = string.match(regexp);
var output = "Bimkingo Clasico Prom Gansito Wonder";
console.log('Matches test output provided by OP "' + output + '":');
console.log(output === matches.join(' '), '\n');
console.log(''); // new line

// Cases not contained in OP's example output string...
var string = 'mArceLino marcelino';
console.log('Also matches all lowercase and uppercase mid word "' + string + '":', '\n');
var matches = string.match(regexp);
console.log(matches.length === 2);
console.log(''); // new line

var string = 'MARCEL1N0 MARCELINO11';
console.log('Excludes all uppercase with number mix "' + string + '":');
var matches = string.match(regexp);
console.log(matches === null);

The accepted answer by "krzyk" matches the output OP posted, but fails to "extract only tokens that don't contain numbers or only upper case letters" in edge cases not represented by OP's example output. Run the code snippet above for a better representation of the issue.

Regex explanation:

( --> // start capture
  \b --> // match start of word
  [a-zA-Z]+ --> // match one or more lowercase and uppercase letters 
  [a-z]+ --> // match one or more lowercase only
  \b --> // match end of word
) --> // end capture

Upvotes: 1

Krzysztof Krasoń
Krzysztof Krasoń

Reputation: 27476

Use:

(\b[A-Z]?[a-z]+\b)+

And retrieve all the groups from your regexp library (as there is no way to gave it in a single group) and join them with spaces.

Test case: https://regex101.com/r/hY2fM8/1

Upvotes: 1

Related Questions