Fatimah Altuhaifa
Fatimah Altuhaifa

Reputation: 31

RegEx using python extracting specific word followed with other three word more than once

I need to extract the word lack with one or 3 words following it from free text using RegEx,

import re
import string
Text = "lack of stair handrails, slippery surfaces, tripping hazards, lack of bathroom grab bars, lack floor"
new_data = re.search(r"(lack (\w+\W+){3})", Text)

print(new_data.group())

the result I got is only one sentence

lack of stair handrails,

but I need the result to be

lack of stair handrails
lack of bathroom grab bars
lack floor

Thanks in advance

Upvotes: 1

Views: 216

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

You can match at least 1 word after lack and then exclude matching the comma from \W and repeat that 0-2 times so there can be 1-3 words after lack.

Note that if you want a max of 3 words after lack, the match given the text lack of bathroom grab bars will be lack of bathroom grab

If you want to match 1 or more words after it, you can change {0,2} to *

\black \w+(?:[^\w,]\w+){0,2}

Regex demo

If there should not be another lack matched, you can check the matched word after it:

\black (?!lack\b)\w+(?:[^\w,](?!lack\b)\w+){0,2}

Regex demo

Upvotes: 2

dawg
dawg

Reputation: 103884

You can use (\black\b[^,]*)

Demo

Explanation:

  1. \b is to limit the match to the word 'lack' and not that substring inside another word;
  2. [^,]* matches all character except a ','.

Python:

>>> import re
>>> s="lack of stair handrails, slippery surfaces, tripping hazards, lack of bathroom grab bars, lack floor"
>>> re.findall(r'\black\b[^,]*',s)
['lack of stair handrails', 'lack of bathroom grab bars', 'lack floor'] 

Upvotes: 1

werden_wissen
werden_wissen

Reputation: 73

If you're working in Python 3.9+ you might like to try out an open source package I published recently called pregex. By using pregex, you can build your pattern as such:

from pregex import *

pre = \
    "lack" + \
    op.Either(
        3 * (tk.Space() + Word()),
        tk.Space() + Word()
    )

You can then even fetch the underlying regex pattern:

regex = pre.get_pattern()

which returns the RegEx pattern that you want:

lack(?:(?: \b\w+\b){3}| \b\w+\b)

Note though that the above pattern will result in the following matches:

['lack of stair handrails', 'lack of bathroom grab', 'lack floor']

Since you wanted 1 or 3 words after "lack", the match "lack of bathroom grab" does not include the word "bars", though this can be easily fixed:

pre = \
    "lack" + \
    op.Either(
        qu.AtLeastAtMost(tk.Space() + Word(), n=3, m=4),
        tk.Space() + Word()
    )

which results in the following pattern:

lack(?:(?: \b\w+\b){3,4}| \b\w+\b)

Upvotes: 1

Related Questions