Reputation: 101
I'm trying to match this data
Combien ?
Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.
Combien 2 ?
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
The pattern should match each question and with it's respective answer
Ex:
Question 1 = Combien
Answer 1 = Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
I tried using positive lookahead (javascript ), but it didn't work.
The pattern i tried:
^(.+)\xA0*(?=\?)\n*
^(.+)\xA0*(?!\?)$
Upvotes: 4
Views: 98
Reputation: 163362
If the delimiter is a question mark and you want to match until the next question , instead of a positive lookahead, you could use a negative lookahead (?!
to assert that the line does not match the question like format:
^(.+ \?)\n((?:\n(?!.* \?$).*)*)
Explanation
^
Start of string(.+ \?)
Match any char 1+ times ending with a space and question mark\n
Match newline(
Capturing group
(?:\n(?!.* \?$).*)*
Match newline, negative lookahead to make sure that the string does not end with a space and question mark. Repeat that 0+ times)
Close capturing groupconst regex = /^(.+ \?)\n((?:\n(?!.* \?$).*)*)/gm;
const str = `Combien ?
Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.
test
Combien 2 ?
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
console.log("Question: " + m[1]);
console.log("Answer: " + m[2]);
}
Upvotes: 1
Reputation: 35222
You could use (.*\?)\n+(.+)
to get question and answer to separate capturing groups
const str = `Combien ?
Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.
Combien 2 ?
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.`;
let regex = /(.*\?)\n+(.+)/g,
matches = [], m;
while(m = regex.exec(str))
matches.push({ question: m[1], answer: m[2] })
console.log(matches)
You could also use \n?(.+)
to match question and answer one by one. You could then split them into separate arrays based on their indexes:
const str = `Combien ?
Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.
Combien 2 ?
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.`;
let regex = /\n?(.+)/g, matches = [], m;
while(m = regex.exec(str))
matches.push(m[1])
console.log("matches \n", matches)
const questions = [], answers = [];
matches.forEach((m, i) => i % 2 ? answers.push(m) : questions.push(m))
console.log("questions \n", questions)
console.log("answers \n", answers)
Upvotes: 0
Reputation: 18357
This regex should capture your question in group1 and answer in group2.
^(\S+(?: \S+)*\s*\?)\s+(\S+(?: \S+)*)$
JS demo,
const s = `Combien ?
Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.
Combien 2 ?
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
`
let m = null
const reg = new RegExp(/^(\S+(?: \S+)*\s*\?)\s+(\S+(?: \S+)*)$/, 'gm');
while ((m = reg.exec(s)) != null) {
console.log("Question: " + m[1])
console.log("Answer: " + m[2])
}
Upvotes: 1
Reputation: 350310
You could use split
with a capture group that will take the question:
str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1);
The slice
will skip any text that precedes the first question. The result will be an array with an even number of entries, alternating question and answer.
var str = `
Combien ?
Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.
Combien 2 ?
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
`;
var qa = str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1);
console.log(qa);
If you want the result in a nice object array, where each object has a question and answer property, then chain a reduce
to the above code:
str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1)
.reduce((acc, m, i, arr) =>
i%2 ? acc.concat({ question: arr[i-1], answer: m.trim() }) : acc,
[]);
var str = `
Combien ?
Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum. Lorem ipsum.
Combien 2 ?
Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum. Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.Lorem ipsum.
`;
var qa = str.split(/\s*?^(.*?)\s*\?\s*?[\r\n]+/m).slice(1)
.reduce((acc, m, i, arr) =>
i%2 ? acc.concat({ question: arr[i-1], answer: m.trim() }) : acc,
[]);
console.log(qa);
Upvotes: 1