Reputation: 51
Let's suppose we have a paragraph like this:
Lorem ipsum, sit amet consectetur adipiscing elit. Lorem - ipsum, sit amet. Morbi a suscipit sem, quis finibus turpis. Lorem ipsum: sit amet. Proin suscipit ac arcu pharetra tincidunt. Lorem ipsum. sit amet. Pellentesque eu lacinia metus. sit amet: Lorem ipsum. Lorem turpis ipsum, sit amet.
I need a regex pcre pattern case insensitive that only selects the words
1 lorem
2 ipsum
3 sit
4 amet
in that specific order ignoring punctutation and occurrences like
Sit amet lorem ipsum
Lorem turpis ipsum, sit amet
Upvotes: 1
Views: 2988
Reputation: 897
Simple straight forward with certain punctuation characters. You can append any punctuation character inside the []:
([Ll]orem)[\s,.!:\-()?]+(ipsum)[\s,.!:\-()?]+(sit)[\s,.!:\-()?]+(amet)
or everything that is a whitespace and not [A-Za-z0-9]
([Ll]orem)[\s\W]+(ipsum)[\s\W]+(sit)[\s\W]+(amet)
Case sensitivity can be an option to switch depending on the programming language. Or you have to manually add every relevant variation like ([L|l]orem)
Upvotes: 1