Reputation: 51
I have a text and used a function to extract a part of the text. However, in the returned value, delimiters (e.g ',', '-') are removed. I need to find the extracted part in the original text including substring and position. e.g:
original_text = "xyz, 19900 Praha 9, Letnany"
(or original_text = "xyz, 19900 Praha 9 - Letnany")
extracted_text = "praha 9 letnany" (lower case, delimiters are removed)
I expect the output is the same as the ouput of re.search('praha 9, letnany', original_text) meaning getting the substring 'Praha 9, Letnany' and start of the match: 11.
Is there any regular expression to locate extracted text in the original text?
Upvotes: 1
Views: 125
Reputation: 51
Same idea as @ScottHunter but process at word level instead of character level:
import re
ori_txt = '19900, Praha 7, Letnany'
extr_txt = 'praha 7 letnany'
delimiters = [',', '\s', '-']
deli = '|'.join([i for i in delimiters])
extr_arr = re.split(deli, extr_txt)
ins_c = ''.join([i for i in delimiters])
ins_c = ''.join(['[', ins_c, ']', '*'])
pat = ins_c.join(extr_arr)
mat = re.search(pat, ori_txt, re.I)
if mat:
print mat.group()
else:
print('not found')
I first want to find a regular expression to directly search for the extracted text in the original text but there seem to be no such an expression. Here is another way to solve my problem. Thank you.
Upvotes: 0
Reputation: 49813
This will locate a span in the original text that matches the extracted text ignoring case & inserting delimiters at will (in this case, comma or dash):
import re
pat = ("[,-]*".join(list(extracted_text))).replace(" ","\\s")
mat = re.search( pat, original_text, re.I )
if mat:
print(mat.span())
else:
print("No match")
Upvotes: 2