Reputation: 33
I'm pretty new to Python, but we are working on cleaning up some text files, and, among others, I will need to do the following:
Replace spaces with underscores, but only in some cases. The cases are such that the beginning is marked with /2
, and the end is marked with /1
.
E.g.:
Here is some text, /2This is an example/1 only.
I would like to turn this into:
Here is some text, This_is_an_example only.
I know how to do a universal replace (either just with python or with regex), and also know how to do a regex search that would match all the /2...../1
expressions. But cannot figure out how to combine those: to replace ONLY when the expression is found, and leave the rest of the text alone.
I would be very grateful for any suggestions!
People keep asking for a code I have and/or point me to basic python documentations. It is a relatively long program since we have to do a lot of things with our input, and this is just one of them. It would be part of a series of find and replace steps; here are some others:
for x in handle:
for r in (("^009", ""),("/c", ""),("#", ""),("\@", "")):
x = x.replace(*r)
# get rid of all remaining latex commands
x = re.sub("\\\\[a-z]+", "", x)
x = re.sub("\.h/.*?//", "", x)
# get rid of punctuation
x = re.sub('\.', '', x)
x = re.sub('\,', '', x)
x = re.sub('\;', '', x)
x = re.sub('\n', ' \n', x)
x = re.sub('\|.*?\|', '', x)
x = re.sub('\'', '', x)
x = re.sub('\"', '', x)
# Here's an initial attempt
y = re.findall('\/2.*?\/1', x)
for item in y:
title = re.sub('\s', '_', item)
#but the question is how do I place these things back into x?
s.write(x)
s.close()
handle.close()
Edit 2: Here is a(nother) thing that does NOT work:
for item in re.findall('\/2.*?\/1', x):
item = re.sub('\s', '_', item)
Upvotes: 2
Views: 299
Reputation: 18641
Use re.sub
with a lambda:
x = re.sub(r'/2.*?/1', lambda x: re.sub(r'\s+', '_', x.group()), x)
Match all strings between /2
and /1
and replace whitespace strings only there with the nested re.sub
.
Upvotes: 3