Reputation: 3039
How do I restrict which parts of the text are searched using regular expressions? Given the example below, say I wanted to get the details of customer02. If I use
Name:\s*(.+)
then obviously I will get 3 results. So I want to restrict it to only search under the details for customer02 and stop when it gets to customer03. I could of course use an index of the results (ie results = ['Mr Smith','Mr Jones','Mr Brown'], therefore results[1]) but that seems clumsy.
[Customer01]
Name: Mr Smith
Address: Somewhere
Telephone: 01234567489
[Customer02]
Name: Mr Jones
Address: Laandon
Telephone:
[Customer03]
Name: Mr Brown
Address: Bibble
Telephone: 077764312
Upvotes: 0
Views: 878
Reputation: 27585
Does the following suit to you ?
ch = """
[Customer01]
Name: Mr Smith
Address: Somewhere
Telephone: 01234567489
[Customer02]
Name: Mr Jones
Address: Laandon
Telephone:
[Customer03]
Name: Mr Brown
Address: Bibble
Telephone: 077764312
[Customer04]
Name: Mr Acarid
Address: Carpet
Telephone: 88864592
[Customer05]
Name: Mr Johannes
Address: Zuidersee
Telephone:
[Customer06]
Name: Mr Bringt
Address: Babylon
Telephone: 077747812
[Customer07]
Name: Ms Amanda
Address: Madrid
Telephone: 187354988
[Customer88]
Name: Ms Heighty
Address: Cairo
Telephone: 11128
"""
import re
blah = '''Enter the characteristics you want the items to be selected upon :
- the Customer's numbers (separated by commas) : '''
must = {'Customer' : re.findall('0*(\d+)',raw_input(blah)) ,'Name':[],'Address':[],'Telephone':[] }
while True:
y = raw_input('- strings desired in the Names (void to finish) : ')
if y: must['Name'].append(y)
else: break
while True:
y = raw_input('- strings desired in the Addresses (void to finish) : ')
if y: must['Address'].append(y)
else: break
while True:
y = raw_input('- strings desired in the Telephone numbers (void to finish) : ')
if y: must['Telephone'].append(y)
else: break
pat = re.compile('\[Customer0*(?P<Customer>\d+)].*\nName:(?P<Name>.*)\nAddress:(?P<Address>.*)\nTelephone:(?P<Telephone>.*)')
print ch,'\n\nmust==',must,'\n\n'
print '\n'.join( repr(match.groups()) for match in pat.finditer(ch)
if any((x==match.group(k) if k=='Customer' else x in match.group(k))
for k in must.iterkeys() for x in must[k]) )
For example entering data so as
must== {'Customer': ['003', '8', '6'], 'Telephone': ['645'], 'Name': [], 'Address': ['Laa']}
the result is
('2', ' Mr Jones ', ' Laandon ', ' ')
('3', ' Mr Brown ', ' Bibble ', ' 077764312')
('4', ' Mr Acarid', ' Carpet ', ' 88864592')
('6', ' Mr Bringt ', ' Babylon ', ' 077747812')
Notice that in this result, the portion corresponding to Customer88 isn't present, despite the fact that '8' has been put as a desired number. That is obtained by testing
x==match.group(k) if k=='Customer'
Otherwise the test is
x in match.group(k)
Hence the "A if condition_upon_k else B" expression
Upvotes: 0
Reputation: 38979
If you know the specific boundaries to search between and you're looking to get a capture group, why not just do:
import re
text = "[Customer01]\nName: Mr Smith\nAddress: Somewhere\nTelephone: 01234567489\n[Customer02]\nName: Mr Jones\nAddress: Laandon\nTelephone:\n[Customer03]\nName: Mr Brown\nAddress: Bibble\nTelephone: 077764312"
blah = re.search("[Customer02]\nName:\s*(.*?)\n", text)
print blah.group(1)
This returns "Mr Jones". I think that's what you want.
Upvotes: 1
Reputation: 151057
What format is the data in? Is it a string? If efficiency is not a major issue, the obvious thing would be to slice the string:
start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end]).group(0)
or more tersely:
name = re.search('Name:\s*(.+)', cdata[cdata.find("[Customer01]"): cdata.find("[Customer02]")]).group(0)
EDIT: or with error checking:
start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end])
if result: name = result.group(0)
Upvotes: 1
Reputation: 53516
This is not a problem regular expressions are meant to solve. Your best bet is to parse out the data into structures first (possibly using regexs to aid in "chunking" the data).
Upvotes: 3
Reputation: 375754
The re
module provides no way to limit the range of the match. You can match against a substring if you already know the indexes you want to limit them to.
Upvotes: 0