Capstone
Capstone

Reputation: 2282

How to extract last occurence of string, between two delimiters from multiple delimiters, using regex?

I am trying to extract

`Something is "670px ?
am"` 

from my input string:

`100000000670
? "Result was 670670px"
? Something is "670px ?
am"
? `

But I can't seem to get it quite right.

If I use the following regex:

/\n\?(.*?)\n\?\s*$/s

Then the extracted string is:

`"Result was 670670px"
? Something is "670px ?
am"`

And If i use:

/\n\?((?!\n\?)*?)\n\?\s*$/s

Then there is no match. I've tried other regexes, but none seem to do the trick.

I want to extract all the characters that occur between the last two question marks that start on a newline. There may be other question marks present in the characters to be extracted. The input string always ends in a question mark followed by a space. What regex will extract the relevant string?

Upvotes: 1

Views: 97

Answers (4)

The fourth bird
The fourth bird

Reputation: 163207

To match the last string that occurs between two question marks which start on a newline:

\n\?[^\S\n]*(.*(?:\n(?!\?).*)*)\n[^\S\n]*\?[^\S\n]*$
  • \n\? Match a newline and ?
  • [^\S\n]* Match optional whitespace chars without a newline
  • ( Capture group 1
    • .* Match the rest of the line
    • (?:\n(?!\?).*)* Match all lines that do not start with ?
  • ) Close group
  • \n[^\S\n]*\?[^\S\n]* Match a newline and a ? between optional spaces
  • $ End of string

Regex demo

const regex = /\n\?[^\S\n]*(.*(?:\n(?!\?).*)*)\n[^\S\n]*\?[^\S\n]*$/;
const s = `100000000670
? "Result was 670670px"
? Something is "670px ?
am"
? `;

const m = s.match(regex);
if (m) {
  console.log(m[1]);
}

Upvotes: 1

MikeM
MikeM

Reputation: 13631

I want to extract the last string that occurs between two question marks which start on a newline. There may be other question marks in the middle but I want to ignore those. The input string always ends in a question mark followed by a space.

const input = `100000000670
? "Result was 670670px"
? "670px ?
am"
? `;

const regex = /\n\?((?:[^\n]|\n(?!\?))*\n)\? $/;

const match = input.match(regex);

console.log(match[1]);

If you want to exclude the leading space and trailing newline just use trim() or trivially adjust the regex. I am just following the letter of the quote I included above, and interpreting "last string" as all characters between the question marks.

Further explanation on request.

Upvotes: 1

Mike Robinson
Mike Robinson

Reputation: 8945

One strategy might be to split the string based on the delimiters, creating an array. Then, pop the last element off of the array. (Or maybe the next-to-last.)

An advantage of such logic, if it applies here, would be that it is fairly obvious what is happening. "Regex spaghetti-code" is a thing to be avoided whenever possible.

Upvotes: 0

MonkeyZeus
MonkeyZeus

Reputation: 20737

This would work:

"[^"]+"(?=\s+\? $(?![\r\n]))
  • "[^"]+" - get double-quoted content
  • (?=\s+\? $(?![\r\n])) - ahead of me must be the question mark + space which terminates the string
    • JS does not support the "end of string" meta-character \Z so we emulate it with $(?![\r\n]) which translates to "no more line breaks after the end of the line"

https://regex101.com/r/lzd2gY/1/

Upvotes: 0

Related Questions