Reputation: 45
Data.txt includes words that are upper and lower-cased.
I need to lower case them all except for the upper-cased characters that appear in braces
which are located immediately following a word that can end in either lower or upper case, but there is no space before the first brace.
e.g.
CAT{TT} Dog{DD} Horse{AA}
Snail{LL} RAT{TT}
ANT{AA}
These should be transformed into:
cat{TT} dog{DD} horse{AA}
snail{LL} rat{TT}
ant{AA}
As a first start, I lower-cased everything in the list and placed them in lcChar
(code as below). I was then trying to find the lower-cased characters within braces so that I could upper case them again.
Being a python newbie, I got stuck in my code below. This gives only the very first item in braces. Also I am assuming I need another loop in order to upper case all the items that appear in the braces. Any help please so I can understand the best methodology for handling these type of issues?
import re
f = open(r'C:\Python27\MyScripts\Data.txt')
for line in f:
lcChar = (line.lower())
patFinder1 = re.compile('{[a-z]+}')
findPat1=re.findall(patFinder1, lcChar)
Upvotes: 0
Views: 68
Reputation: 44344
re.sub
and re.subn
allow the second parameter to be a function. A Match Object is passed into that function and whatever the function returns is used for the substitution.
This is my take on it:
import re
def manip(m):
return m.groups()[0].lower()
data = ['CAT{TT} Dog{DD} Horse{AA}',
'Snail{LL} RAT{TT}',
'ANT{AA}']
for line in data:
new_line = re.sub(r'((?:[^{]|^)[A-Z]+(?:[^}]|$))', manip, line)
print new_line
Produces:
cat{TT} dog{DD} horse{AA}
snail{LL} rat{TT}
ant{AA}
I could have used a lambda
instead, but that's arguably less clear.
Upvotes: 2
Reputation: 6478
A straight forward way of doing it:
import re
regex = re.compile('([^}]*?{)')
str_ = '''CAT{TT} Dog{DD} Horse{AA}
Snail{LL} RAT{TT}
ANT{AA}'''
new_str = re.sub(regex, lambda match: match.groups()[0].lower(), str_)
assert new_str == '''cat{TT} dog{DD} horse{AA}
snail{LL} rat{TT}
ant{AA}'''
print new_str
I use the regex to only match what need to be lowercased:
Then I loop over the results and replace to lowercase version.
Edit: more optimize version using sub to replace.
Upvotes: 1