Hypothetical Ninja
Hypothetical Ninja

Reputation: 4077

Conditioning on Regex

I have several strings from which I need to extract the block numbers. The block numbers are of the format type "3rd block" , "pine block" ,"block 2" and "block no 4". Please note that is just the format type and the numbers could change. I have added them in OR conditions .

The problem is that at times the regex extracts the previous word connected to something else like "main phase block 2" would mean I need "block 2" to be extracted . Using re.search causes the 1st result to turn up and there are even limitations of "OR".

What I want is to add exceptions or condition my regex with something like

  1. if 1 or 2 digits (like 23 , 3 ,6 ,7 etc) occur before the word "block", extract "block" with the word following "block".

    Eg :

     string = "rmv clusters phase 2 block 1 , flat no 209 dev." #extract "block 1" and not "2 block".
    
  2. if words "phase , apartment or building" come before "block", extract word that follows block (irrespective of whether its a number or word)
    Eg:

     string 2 = "sky line apartments block 2 chandra layout" #extract "block 2" and not "apartments block" 
    

Here is what I have done. But I've got no idea about adding conditions.

 p = re.compile(r'(block[^a-z]\s\d*)|(\w+\sblock[^a-z])|(block\sno\s\d+)')
    q = p.search(str) 

this is a part of an entire function.

Upvotes: 0

Views: 143

Answers (3)

Aaron Hall
Aaron Hall

Reputation: 395085

Tested on Python 2.7 and 3.3.

import re

strings = ("rmv clusters phase 2 block 1 , flat no 209 dev." 
           "sky line apartments block 2 chandra layout"
           "foo bar 99 block baz") # tests rule 1.

Here's the rules you stated you wanted:

  1. if 1 or 2 digits (like 23 , 3 ,6 ,7 etc) occur before the word "block", extract "block" with the word following "block".
  2. if words "phase , apartment or building" come before "block", extract word that follows block (irrespective of whether its a number or word). * I'm inferring you want the word block too.

So

regex = re.compile(r'''
           (?:\d{1,2}\s)(block\s\w*) # rule 1
             |   # or
           (?:(phase|apartment|building).*?)(block\s\w+) # rule 2
             ''', re.X)

found = regex.finditer(strings)

for i in found:
    print(i.groups())

prints:

(None, 'phase', '1')
(None, 'apartment', '2')
('block baz', None, None)

None is the default for a group if not found, so, you can pick a preference and allow the short-cutting or to return the first if it's non-empty, or the second if the first is empty (i.e. evaluates as False in Python's boolean contexts).

>>> found = regex.finditer(strings)
>>> for i in found:
...   print(i.group(1) or i.group(3))
... 
1
2
block baz

So to put this thing into a simple function:

def block(str):
    regex = re.compile(r'''
               (?:\d{1,2}\s)(block\s\w*) # rule 1
                 |   # or
               (?:(phase|apartment|building).*?)(block\s\w+) # rule 2
                 ''', re.X)
    match = regex.search(str)
    if not match:
        return ''
    else:
        return match.group(1) or match.group(3) or ''

usage:

>>> block("foo bar 99 block baz")
'block baz'
>>> block("sky line apartments block 2 chandra layout")
'block 2'

Upvotes: 1

Amit
Amit

Reputation: 20456

>> import re
>>> string = "rmv clusters phase 2 block 1 , flat no 209 dev."
>>> string2 = "sky line apartments block 2 chandra layout"
>>> print re.findall(r'block\s+\d+', string)
['block 1']
>>> print re.findall(r'block\s+\d+', string2)
['block 2']

Upvotes: 1

BiGYaN
BiGYaN

Reputation: 7159

Why don't you write multiple regexes? See the following snippet in python3

def getBlockMatch(string):
    import re
    p1Regex = re.compile('block\s+\d+')
    p2Regex = re.compile('(block[^a-z]\s\d*)|(\w+\sblock[^a-z])|(block\sno\s\d+)')
    if p1Regex.search(string) is not None:
        return p1Regex.findall(string)
    else:
        return p2Regex.findall(string)

string = "rmv clusters phase 2 block 1 , flat no 209 dev."
print(getBlockMatch(string))

string = "sky line apartments block 2 chandra layout"
print(getBlockMatch(string))

Outputs:

['block 1']
['block 2']

Upvotes: 1

Related Questions