lfergon
lfergon

Reputation: 973

Regex catch string between two strings, multiple lines

I´m working on a *.po file, I´m trying to catch all the text between msgid "" and msgstr "", not really lucky, never more than one line:

msgid ""
"%s asdfgh asdsfgf asdfg %s even if you "
"asdfgdh sentences with no sense. We are not asking  translate "
"Shakespeare's %s Hamlet %s !. %s testing regex %s "
"don't require specific industry knowledge. enjoying "
msgstr ""

What I´ve tried:

var myArray = fileContent.match(/msgid ([""'])(?:(?=(\\?))\2.)*?\1/g);

Thanks for your help, I´m not really good with regex :(

Upvotes: 4

Views: 10405

Answers (4)

Eric Seastrand
Eric Seastrand

Reputation: 2633

I realize that the question specifically asks for a regular expression, but you should consider using string split instead if you can.

Here is a ready-made function:

function extractTextBetween(subject, start, end) {
    try{
        return subject.split(start)[1].split(end)[0];
    } catch(e){
        console.log("Exception when extracting text", e);
    }
}

http://jsfiddle.net/b33hdh9b/3/

Upvotes: 2

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89629

Try with this pattern:

/msgid (["']{2})\n([\s\S]*?)\nmsgstr \1/

The result is in the second capturing group, but you can make more simple with:

/msgid ["']{2}\n([\s\S]*?)\nmsgstr /

in the first capturing group

Upvotes: 2

Jerry
Jerry

Reputation: 71598

You could perhaps try this regex?

msgid ""((?:.|[\n\r])+)msgstr ""

((?:.|[\n\r])+) this is your catching group;

(?:.|[\n\r])+ This enables the match of . or [\n\r] multiple times, the \n\r are for newlines and carriage returns.

Tested

Upvotes: 1

Andrew Clark
Andrew Clark

Reputation: 208655

Here is one way to extract all of that text:

var match = text.replace(/msgid ""([\s\S]*?)msgstr ""/, "$1");

Example: http://jsfiddle.net/bqk79/

The [\s\S] is a character class that will match any character including line breaks, so [\s\S]*? will match any number of any character. In other languages you could use the s or DOTALL flag to make . match line breaks, but Javascript does not support this.

Note that you regex doesn't make any mention of single quotes, but if you need to be able to match between msgid '' and msgstr '' as well you can use the following:

var match = text.replace(/msgid (['"]{2})([\s\S]*?)msgstr \1/, "$2");

Upvotes: 10

Related Questions