krakig
krakig

Reputation: 1555

Finding characters with spaces

I was trying last week to find parts of a text containing specific words delimited by punctuation characters. That works well.

[^.?!:]*\b(why|how)\b[^.?!]*[.?!]

On the following sentence "How did you do it? bla bla bla! why did you do it?", it's giving me the following output :

"How did you do it?"
"why did you do it?"

Now I am trying to add the hyphen character : I want to detect if there is an hyphen with spaces around (a new sentence delimiter):

"The man went walking upstairs - why was he there?

That would return me : "why was he there?"

It would follow the following rules:

hello - bye -> this would be the only one to be matched
hello-bye -> not matched
hello -bye -> not matched
hello- bye -> not matched

Using the negation, I tried to add that part :

[^.?!:\\s\\-\\s] => ignore everything that ends with a "." or a "?" or a "!" or a ":" or a " - "

I doesn't work, but as I am pretty bad using regex, I am probably missing something obvious.

var regex = /[^.?!:\\s\\-\\s]*\b(why|how)\b[^.?!]*[.?!]/igm
var text = "Here I am - why did you want to see me?"

var match;

while ((match = regex.exec(text)) != null) {
    console.log(match);
}

Output :

Here I am - why did you want to see me?

Expected output :

why did you want to see me?

Upvotes: 1

Views: 67

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

There are two issues that I see:

  • backslashes (use single inside a regex literal, double in constructor) and
  • Sequence is used inside a character class (replace [^.?!:\s\-\s] with (?:(?!\s-\s)[^.?!:])*).

You may use

var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|how)\b[^.?!]*)[.?!]/ig

where (?:(?!\s-\s)[^.?!:])* is a tempered greedy token matching any character other than ^.?!: that is not starting a whitespace+-+whitespace pattern.

var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|where|pourquoi|how)\b[^.?!]*)[.?!]/ig;
var text = "L'Inde a déjà acheté nos rafales, pourquoi la France ne le -dirait-elle pas ?";
var match;
while ((match = regex.exec(text)) != null) {
    console.log(match[1]);
}

Upvotes: 1

jusopi
jusopi

Reputation: 6813

Given your 4 examples, this works.

/\s-\s(\w*)/g

Test it here - https://regex101.com/r/YQhRBI/1

I'm matching ANY character within the question portion. If you want to match specific key words, you'd swap the (\w*) with ([why|how|who|what|where|when])

I think if you had a paragraph, you'd have to be sure to find a way to terminate the answer portion with a specific delimiter. If this was more along the lines of a question/answer per new line, then you'd need only to end the regex with an end-of-line anchor.

Upvotes: 1

Gerard van Helden
Gerard van Helden

Reputation: 1602

[ ] is always a character class, which means that at one position, you can match one character. The "negation" in your example is in fact probably not even doing what you thing it does.

What you probably want to match is either the beginning of a string, the end of a sentence, or a dash with two spaces around, so just replace it with (^|[.?!]| - )\b((why|how)...etc). You will need some post processing of the result, as JavaScript does not support look-behind assertions as far as I know.

Upvotes: 1

Related Questions