James De Souza
James De Souza

Reputation: 668

extract sentence from string within specific word

I have some long text like below

Hello everyone. My name is James! Tell me your names? so I'd greet you...

Try to find word name and return sentences which contains it

I thougt the long way (find index and look for new line or dot or question mark, etc.. before the index in a loop) but it doesn't look efficient!

Is there a faster way to achieve this?

Upvotes: 2

Views: 2698

Answers (1)

Chase Ingebritson
Chase Ingebritson

Reputation: 1579

Essentially, you need to use a regex that matches sentences, split them between the sentences to make an array of sentences, then filter the array by checking if the sentence includes the provided word.

Note that this function requires the input string to use correct capitalization and punctuation.

// The input string
let input = "Hello everyone. My name is James! Tell me your names? So I'd greet you..."

// Our function that finds sentences that include a given word
// Input: Word - The word you want to find
// Input: Text - The text you'll be searching through
// Output: An array of sentences from our text input that include the word input
function getSentencesWithWord(word, text) {
  // Search for sentences, insert a pipe, then split on the pipe
  const sentenceArray = text.replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|")

  // Filter our array by checking if each sentence includes the word, then immedietly returns it
  return sentenceArray.filter(sentence => sentence.includes(word))
}

// Run a test of our function
console.log(getSentencesWithWord('name', input))

Thanks to @YanFoto's comment referencing this answer.


Edit

Here's a short explanation of the regex, pulled from the post linked above:

1) Find punctuation marks (one of . or ? or !) and capture them

2) Punctuation marks can optionally include spaces after them.

3) After a punctuation mark, I expect a capital letter.

Unlike the previous regular expressions provided, this would properly match the English language grammar.

From there:

4) We replace the captured punctuation marks by appending a pipe |

5) We split the pipes to create an array of sentences.

If you would like to add support for other non-English special characters at the beginning of sentences, you'll have to adjust the regex. Currently, only A-Z are included in our match, but if we add À-ȕ, we can include special characters as well. Overall, we'd end up with something like this, /([.?!])\s*(?=[A-ZÀ-ȕ])/g.

Please note that my experience with non-English characters is limited and it may need to be adjusted to only allow for capital non-English characters.

Upvotes: 4

Related Questions