Reputation: 3039

Restrict a regex to only search between two points

How do I restrict which parts of the text are searched using regular expressions? Given the example below, say I wanted to get the details of customer02. If I use

Name:\s*(.+)

then obviously I will get 3 results. So I want to restrict it to only search under the details for customer02 and stop when it gets to customer03. I could of course use an index of the results (ie results = ['Mr Smith','Mr Jones','Mr Brown'], therefore results[1]) but that seems clumsy.

[Customer01]

Name: Mr Smith

Address: Somewhere

Telephone: 01234567489

[Customer02]

Name: Mr Jones

Address: Laandon

Telephone:

[Customer03]

Name: Mr Brown

Address: Bibble

Telephone: 077764312

Upvotes: 0

Answers (5)

eyquem

Reputation: 27585

Does the following suit to you ?

ch = """
[Customer01]
Name: Mr Smith
Address: Somewhere 
Telephone: 01234567489

[Customer02] 
Name: Mr Jones 
Address: Laandon 
Telephone: 

[Customer03] 
Name: Mr Brown 
Address: Bibble 
Telephone: 077764312

[Customer04]
Name: Mr Acarid
Address: Carpet 
Telephone: 88864592

[Customer05] 
Name: Mr Johannes 
Address: Zuidersee 
Telephone: 

[Customer06] 
Name: Mr Bringt 
Address: Babylon 
Telephone: 077747812

[Customer07] 
Name: Ms Amanda 
Address: Madrid 
Telephone: 187354988

[Customer88] 
Name: Ms Heighty 
Address: Cairo 
Telephone: 11128

"""

import re

blah = '''Enter the characteristics you want the items to be selected upon :
- the Customer's numbers (separated by commas) : '''
must = {'Customer' : re.findall('0*(\d+)',raw_input(blah)) ,'Name':[],'Address':[],'Telephone':[] }

while True:
    y = raw_input('- strings desired in the Names (void to finish) : ')
    if y:  must['Name'].append(y)
    else:  break

while True:
    y = raw_input('- strings desired in the Addresses (void to finish) : ')
    if y:  must['Address'].append(y)
    else:  break

while True:
    y = raw_input('- strings desired in the Telephone numbers (void to finish) : ')
    if y:  must['Telephone'].append(y)
    else:  break

pat = re.compile('\[Customer0*(?P<Customer>\d+)].*\nName:(?P<Name>.*)\nAddress:(?P<Address>.*)\nTelephone:(?P<Telephone>.*)')

print ch,'\n\nmust==',must,'\n\n'

print '\n'.join( repr(match.groups()) for match in pat.finditer(ch)
                 if any((x==match.group(k) if k=='Customer' else x in match.group(k))
                        for k in must.iterkeys() for x in must[k]) )

For example entering data so as

must== {'Customer': ['003', '8', '6'], 'Telephone': ['645'], 'Name': [], 'Address': ['Laa']}

the result is

('2', ' Mr Jones ', ' Laandon ', ' ')
('3', ' Mr Brown ', ' Bibble ', ' 077764312')
('4', ' Mr Acarid', ' Carpet ', ' 88864592')
('6', ' Mr Bringt ', ' Babylon ', ' 077747812')

Notice that in this result, the portion corresponding to Customer88 isn't present, despite the fact that '8' has been put as a desired number. That is obtained by testing

x==match.group(k) if k=='Customer'

Otherwise the test is

x in match.group(k)

Hence the "A if condition_upon_k else B" expression

Upvotes: 0

Eli

Reputation: 38979

If you know the specific boundaries to search between and you're looking to get a capture group, why not just do: import re text = "[Customer01]\nName: Mr Smith\nAddress: Somewhere\nTelephone: 01234567489\n[Customer02]\nName: Mr Jones\nAddress: Laandon\nTelephone:\n[Customer03]\nName: Mr Brown\nAddress: Bibble\nTelephone: 077764312" blah = re.search("[Customer02]\nName:\s*(.*?)\n", text) print blah.group(1)

This returns "Mr Jones". I think that's what you want.

Upvotes: 1

senderle

Reputation: 151057

What format is the data in? Is it a string? If efficiency is not a major issue, the obvious thing would be to slice the string:

start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end]).group(0)

or more tersely:

name = re.search('Name:\s*(.+)', cdata[cdata.find("[Customer01]"): cdata.find("[Customer02]")]).group(0)

EDIT: or with error checking:

start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end])
if result: name = result.group(0)

Upvotes: 1

Andrew White

Reputation: 53516

This is not a problem regular expressions are meant to solve. Your best bet is to parse out the data into structures first (possibly using regexs to aid in "chunking" the data).

Upvotes: 3

Ned Batchelder

Reputation: 375754

The re module provides no way to limit the range of the match. You can match against a substring if you already know the indexes you want to limit them to.

Upvotes: 0

Restrict a regex to only search between two points

Answers (5)

Related Questions