Miguel
Miguel

Reputation: 51

Regular expression matching a sequence of words

Let's suppose we have a paragraph like this:

Lorem ipsum, sit amet consectetur adipiscing elit. Lorem - ipsum, sit amet. Morbi a suscipit sem, quis finibus turpis. Lorem ipsum: sit amet. Proin suscipit ac arcu pharetra tincidunt. Lorem ipsum. sit amet. Pellentesque eu lacinia metus. sit amet: Lorem ipsum. Lorem turpis ipsum, sit amet.

I need a regex pcre pattern case insensitive that only selects the words

1 lorem
2 ipsum
3 sit
4 amet 

in that specific order ignoring punctutation and occurrences like

Sit amet lorem ipsum
Lorem turpis ipsum, sit amet

Upvotes: 1

Views: 2988

Answers (1)

futu
futu

Reputation: 897

Simple straight forward with certain punctuation characters. You can append any punctuation character inside the []:

([Ll]orem)[\s,.!:\-()?]+(ipsum)[\s,.!:\-()?]+(sit)[\s,.!:\-()?]+(amet)

or everything that is a whitespace and not [A-Za-z0-9]

([Ll]orem)[\s\W]+(ipsum)[\s\W]+(sit)[\s\W]+(amet)

Case sensitivity can be an option to switch depending on the programming language. Or you have to manually add every relevant variation like ([L|l]orem)

Regex101 Example

Upvotes: 1

Related Questions