Elip
Elip

Reputation: 551

Python regular expressions - find a string in a file that occurs somewhere before another string?

My programming knowledge is very limited, I would really appreciate any help on this possibly obvious problem!

Lets say I have a text file, that somewhere contains the text: "I own two (Some text in between...) bicycles."

How could I for example change two to three? Meaning I need a function to find the string "bicycles" and then look to the left until it somewhere finds the string "two" and changes that.

Upvotes: 1

Views: 188

Answers (2)

eyquem
eyquem

Reputation: 27575

With regular expressions:

import re

line = '-------------------------------------------------------------\n'

ss = ('I gave two similar things to my two twin sons: '
      'two spankings, two nice BICYCLES of 300 dollars each, '
      'yes 600 dollars for two horridly nice BICYCLES, '
      'two times 300 dollars for two sons receiving two BICYCLES !, '
      'two dollars too, but never two dogs')
print ss,'\n\n'


print line + '1) Replacing the more at right before the first "BICYCLES":\n'
reg = re.compile('two(?=(?:.(?!two))*?BICYCLES)(.+)')
print reg.sub('@@@@\\1',ss)


print line + '2) Replacing the more at right before the last "BICYCLES":\n'
reg = re.compile('two(?=(?:.(?!two))*?BICYCLES(?!.*?BICYCLES))')
print reg.sub('@@@@',ss)


print line + '3) Replacing all before the first "BICYCLES":\n'
reg = re.compile('(two)|BICYCLES.+')
print reg.sub(lambda mat: '@@@@' if mat.group(1) else mat.group(),ss)


print line + '4) Replacing all before the last "BICYCLES":\n'
reg = re.compile('(two)|BICYCLES(?!.*?BICYCLES).+')
print reg.sub(lambda mat: '@@@@' if mat.group(1) else mat.group(),ss)

result

I gave two similar things to my two twin sons: two spankings, two nice BICYCLES of 300 dollars each, yes 600 dollars for two horridly nice BICYCLES, two times 300 dollars for two sons receiving two BICYCLES !, two dollars too, but never two dogs 


-------------------------------------------------------------
1) Replacing the more at right before the first "BICYCLES":

I gave two similar things to my two twin sons: two spankings, @@@@ nice BICYCLES of 300 dollars each, yes 600 dollars for two horridly nice BICYCLES, two times 300 dollars for two sons receiving two BICYCLES !, two dollars too, but never two dogs
-------------------------------------------------------------
2) Replacing the more at right before the last "BICYCLES":

I gave two similar things to my two twin sons: two spankings, two nice BICYCLES of 300 dollars each, yes 600 dollars for two horridly nice BICYCLES, two times 300 dollars for two sons receiving @@@@ BICYCLES !, two dollars too, but never two dogs
-------------------------------------------------------------
3) Replacing all before the first "BICYCLES":

I gave @@@@ similar things to my @@@@ twin sons: @@@@ spankings, @@@@ nice BICYCLES of 300 dollars each, yes 600 dollars for two horridly nice BICYCLES, two times 300 dollars for two sons receiving two BICYCLES !, two dollars too, but never two dogs
-------------------------------------------------------------
4) Replacing all before the last "BICYCLES":

I gave @@@@ similar things to my @@@@ twin sons: @@@@ spankings, @@@@ nice BICYCLES of 300 dollars each, yes 600 dollars for @@@@ horridly nice BICYCLES, @@@@ times 300 dollars for @@@@ sons receiving @@@@ BICYCLES !, two dollars too, but never two dogs

.

It is also possible without regular expressions:

line = '-------------------------------------------------------------\n'

ss = ('I gave two similar things to my two twin sons: '
      'two spankings, two nice BICYCLES of 300 dollars each, '
      'yes 600 dollars for two horridly nice BICYCLES, '
      'two times 300 dollars for two sons receiving two BICYCLES !, '
      'two dollars too, but never two dogs')
print ss,'\n\n'


print line + '1) Replacing the more at right before the first "BICYCLES":\n'
fb = ss.find('BICYCLES')
print '@@@@'.join(ss[0:fb].rsplit('two',1)) + ss[fb:] if fb+1 else ss


print line + '2) Replacing the more at right before the last "BICYCLES":\n'
fb = ss.rfind('BICYCLES')
print '@@@@'.join(ss[0:fb].rsplit('two',1)) + ss[fb:] if fb+1 else ss


print line + '3) Replacing all before the first "BICYCLES":\n'
fb = ss.find('BICYCLES')
print ss[0:fb].replace('two','@@@@') + ss[fb:] if fb+1 else ss


print line + '4) Replacing all before the last "BICYCLES":\n'
fb = ss.rfind('BICYCLES')
print ss[0:fb].replace('two','@@@@') + ss[fb:] if fb+1 else ss

results are the same

.

But using regular expressions give more possibilities:

import re

ss = ('Mr Dotwo bought two gifts for his two sons, two hours ago: two BICYCLES '
      'because his two sons wanted only two BICYCLES')
print ss,'\n\n'


print 'Replacing all "two" before the first "BICYCLES":\n'
reg = re.compile('(\\btwo\\b)|BICYCLES.+')
print reg.sub(lambda mat: '@@@@' if mat.group(1) else mat.group(),ss)

result

Mr Dotwo bought two gifts for his two sons, two hours ago: two BICYCLES because his two sons wanted only two BICYCLES 


Replacing all strings "two" before the first "BICYCLES":

Mr Dotwo bought @@@@ gifts for his @@@@ sons, @@@@ hours ago: @@@@ BICYCLES because his two sons wanted only two BICYCLES

Upvotes: 0

phihag
phihag

Reputation: 287835

You could do this with regular expressions:

>>> import re
>>> s = 'I own two (Some text in between...) bicycles and two dogs.'
>>> re.sub('two(.*bicycles)', 'three\\1', s)
'I own three (Some text in between...) bicycles and two dogs.'

or regular string functions:

>>> try:
...   p = s.rindex('two', 0, s.index('bicycles'))
...   s[:p] + 'three' + s[p+len('two'):]
... except ValueError:
...   pass # No bicycles or no two
...
'I own three (Some text in between...) bicycles and two dogs.'

Upvotes: 1

Related Questions