Élisa Plessis
Élisa Plessis

Reputation: 101

How to parse multiple line complex regex pattern?

I'm trying to match this data

Combien ?

Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.

Combien 2 ?

Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.

The pattern should match each question and with it's respective answer

Ex:

Question 1 = Combien

Answer 1 = Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.

I tried using positive lookahead (javascript ), but it didn't work.

The pattern i tried:

^(.+)\xA0*(?=\?)\n* 
^(.+)\xA0*(?!\?)$

Upvotes: 4

Views: 98

Answers (4)

The fourth bird
The fourth bird

Reputation: 163362

If the delimiter is a question mark and you want to match until the next question , instead of a positive lookahead, you could use a negative lookahead (?! to assert that the line does not match the question like format:

^(.+ \?)\n((?:\n(?!.* \?$).*)*)

Explanation

  • ^ Start of string
  • (.+ \?) Match any char 1+ times ending with a space and question mark
  • \n Match newline
  • ( Capturing group
    • (?:\n(?!.* \?$).*)* Match newline, negative lookahead to make sure that the string does not end with a space and question mark. Repeat that 0+ times
  • ) Close capturing group

Regex demo

const regex = /^(.+ \?)\n((?:\n(?!.* \?$).*)*)/gm;
const str = `Combien ?

Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.

test

Combien 2 ?

Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.`;
let m;

while ((m = regex.exec(str)) !== null) {
  if (m.index === regex.lastIndex) {
    regex.lastIndex++;
  }
  console.log("Question: " + m[1]);
  console.log("Answer: " + m[2]);
}

Upvotes: 1

adiga
adiga

Reputation: 35222

You could use (.*\?)\n+(.+) to get question and answer to separate capturing groups

const str = `Combien ?

Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.

Combien 2 ?

Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.`;

let regex = /(.*\?)\n+(.+)/g, 
    matches = [], m;

while(m = regex.exec(str))
  matches.push({ question: m[1], answer: m[2] })

console.log(matches)

You could also use \n?(.+) to match question and answer one by one. You could then split them into separate arrays based on their indexes:

const str = `Combien ?

Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.

Combien 2 ?

Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.`;

let regex = /\n?(.+)/g, matches = [], m;

while(m = regex.exec(str))
  matches.push(m[1])

console.log("matches \n", matches)

const questions = [], answers = [];
matches.forEach((m, i) => i % 2 ? answers.push(m) : questions.push(m))

console.log("questions \n", questions)
console.log("answers \n", answers)

Upvotes: 0

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

This regex should capture your question in group1 and answer in group2.

^(\S+(?: \S+)*\s*\?)\s+(\S+(?: \S+)*)$

Regex Demo

JS demo,

const s = `Combien ?

Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.

Combien 2 ?

Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
`
let m = null


const reg = new RegExp(/^(\S+(?: \S+)*\s*\?)\s+(\S+(?: \S+)*)$/, 'gm');
while ((m = reg.exec(s)) != null) {
    console.log("Question: " + m[1])
    console.log("Answer: " + m[2])
}

Upvotes: 1

trincot
trincot

Reputation: 350310

You could use split with a capture group that will take the question:

str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1);

The slice will skip any text that precedes the first question. The result will be an array with an even number of entries, alternating question and answer.

var str = `
Combien ?

Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.

Combien 2 ?

Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
`;

var qa = str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1);

console.log(qa);

If you want the result in a nice object array, where each object has a question and answer property, then chain a reduce to the above code:

str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1)
   .reduce((acc, m, i, arr) => 
       i%2 ? acc.concat({ question: arr[i-1], answer: m.trim() }) : acc, 
   []);

var str = `
Combien ?

Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.

Combien 2 ?

Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
`;

var qa = str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1)
   .reduce((acc, m, i, arr) => 
       i%2 ? acc.concat({ question: arr[i-1], answer: m.trim() }) : acc, 
   []);

console.log(qa);

Upvotes: 1

Related Questions