Reputation: 1555
I was trying last week to find parts of a text containing specific words delimited by punctuation characters. That works well.
[^.?!:]*\b(why|how)\b[^.?!]*[.?!]
On the following sentence "How did you do it? bla bla bla! why did you do it?"
, it's giving me the following output :
"How did you do it?"
"why did you do it?"
Now I am trying to add the hyphen character : I want to detect if there is an hyphen with spaces around (a new sentence delimiter):
"The man went walking upstairs - why was he there?
That would return me : "why was he there?"
It would follow the following rules:
hello - bye -> this would be the only one to be matched
hello-bye -> not matched
hello -bye -> not matched
hello- bye -> not matched
Using the negation, I tried to add that part :
[^.?!:\\s\\-\\s] => ignore everything that ends with a "." or a "?" or a "!" or a ":" or a " - "
I doesn't work, but as I am pretty bad using regex, I am probably missing something obvious.
var regex = /[^.?!:\\s\\-\\s]*\b(why|how)\b[^.?!]*[.?!]/igm
var text = "Here I am - why did you want to see me?"
var match;
while ((match = regex.exec(text)) != null) {
console.log(match);
}
Output :
Here I am - why did you want to see me?
Expected output :
why did you want to see me?
Upvotes: 1
Views: 67
Reputation: 626748
There are two issues that I see:
[^.?!:\s\-\s]
with (?:(?!\s-\s)[^.?!:])*)
.You may use
var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|how)\b[^.?!]*)[.?!]/ig
where (?:(?!\s-\s)[^.?!:])*
is a tempered greedy token matching any character other than ^.?!:
that is not starting a whitespace
+-
+whitespace
pattern.
var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|where|pourquoi|how)\b[^.?!]*)[.?!]/ig;
var text = "L'Inde a déjà acheté nos rafales, pourquoi la France ne le -dirait-elle pas ?";
var match;
while ((match = regex.exec(text)) != null) {
console.log(match[1]);
}
Upvotes: 1
Reputation: 6813
Given your 4 examples, this works.
/\s-\s(\w*)/g
Test it here - https://regex101.com/r/YQhRBI/1
I'm matching ANY character within the question portion. If you want to match specific key words, you'd swap the (\w*)
with ([why|how|who|what|where|when])
I think if you had a paragraph, you'd have to be sure to find a way to terminate the answer portion with a specific delimiter. If this was more along the lines of a question/answer per new line, then you'd need only to end the regex with an end-of-line anchor.
Upvotes: 1
Reputation: 1602
[ ]
is always a character class, which means that at one position, you can match one character. The "negation" in your example is in fact probably not even doing what you thing it does.
What you probably want to match is either the beginning of a string, the end of a sentence, or a dash with two spaces around, so just replace it with (^|[.?!]| - )\b((why|how)...etc)
. You will need some post processing of the result, as JavaScript does not support look-behind assertions as far as I know.
Upvotes: 1