Rachel
Rachel

Reputation: 315

Regular Expression to extract words across sentences

My input is

The world is round. We meet everyone quite soon. It is a small world. Be happy.

I want sentences containing the words small and happy. My regular expression is [.]\\s*.*?small.*?happy.*?[.] The expected output is

It is a small world. Be happy.

but i get the output as

. We meet everyone quite soon. It is a small world. Be happy.

Could someone please help me with this?

Upvotes: 1

Views: 299

Answers (4)

hwnd
hwnd

Reputation: 70732

You can use a word boundary \b here.

\b[^.]*small.*?happy[^.]*\.

Or make your own boundary.

(?:^|\. )([^.]*small.*?happy[^.]*\.)

Upvotes: 3

hsz
hsz

Reputation: 152226

Just try with following regex:

((?<=^|\. )[^.]*?(?:small|happy)[^.]*\.)

demo

Output:

MATCH 1
1.  [49-69] `It is a small world.`
MATCH 2
1.  [70-79] `Be happy.`

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

You could try the below regex,

(?<=^|\. )[^.]*small.*?happy[^.]*\.

DEMO

Upvotes: 2

Unihedron
Unihedron

Reputation: 11041

Use these regexes:

(?<!\.)[^.]*small[^.]*\.
(?<!\.)[^.]*happy[^.]*\.

Here is a regex demo.

  • (?<!\.) Asserts that we are not ahead of a single period, this allows matching after the whitespace between sentences.
  • [^.]* Matches any character sequence that isn't a dot, thus effectively limiting between sentences.
  • happy The character sequence "happy".
  • [^.]*
  • \. Finishes the line with a period.

Upvotes: 0

Related Questions