Reputation: 2836
I have a file with multiple lines in it like this:
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
I want to replace the 1371078139195 (in this case) with another number. The value I want to replace is always in the first comma separated word and is always the second last underscore separated value in that word. The following is the way I did this and it works but this seems unseemly and clumsy.
>>> line="'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> l1=",".join(line.split(",")[1:])
>>> print l1
{'cf:rv': '0'}
>>> l2=line.split(",")[0]
>>> print l2
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442'
>>> print "_".join(l2.split('_')[:-2])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight
>>>
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442'
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1]) + "," + l1
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
>>>
Is there an easier way to replace (maybe using regular expressions) the value? I can't imagine that this is the best way
I have a few answers and I have to stress that its the second last underscored value. The following are valid strings:
line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}"
In the above case there is a digit string within the string that is not after the second last underscore. Also the last part may or may not be all digits (it could be +14155186442 or it could be 14155186442). Sorry I didn't mention this above.
A
Upvotes: 3
Views: 1310
Reputation: 114491
Using regular expressions:
m = re.match("([^,]*_)([+]?[0-9]+)(_.*)", s)
if m:
before = m.group(1)
number = m.group(2)
after = m.group(3)
s = before + new_number(number) + after
the meaning is
[^,]*_
= how many chars you want but not commas, followed by an underscore[+]?[0-9]+
= digits, optionally preceded by +
_.*
= an underscore followed by whatever is thereThis works because regexp matches are by default "greedy" so [^,]*
will actually use all the underscore, stopping right before the second-last for the match to succeed.
If for example you need instead of the second-last underscore separated you need the third-last the expression could be changed to
m = re.match("([^,]*_)([+]?[0-9]+)(_[^,]*_.*)", s)
thus requiring that after the number there are at least two underscores before a comma.
Upvotes: 4
Reputation: 123473
Not as sophisticated as a regex, but relatively simple to code, understand, debug, and change in the future. Other than the separator characters, it makes no assumptions about the what letters make up a "word".
def replace_term(line, replacement):
csep = line.split(',')
usep = csep[0].split('_')
return ','.join(['_'.join(usep[:-2] + [replacement] + usep[-1:])] + csep[1:])
lines = ["'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}",
"'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}",
"'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}"]
for line in lines:
print replace_term(line, 'XXX')
Output:
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_XXX_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_XXX_14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_XXX_1371078139195', {'cf:rv': '0'}
Upvotes: 0
Reputation: 6710
Like this?
>>> line = "'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> re.subn('_(\d+)_', '_mynewnumber_', line, count=1)
("'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_mynewnumber_+14155186442', {'cf:rv': '0'}",
1)
Upvotes: 1
Reputation: 250961
Non-regex solution:
>>> strs = " 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> first, sep, rest = strs.partition(',')
>>> lis = first.rsplit('_', 2)
>>> lis[1] = "1111111"
>>> "_".join(lis) + sep + rest
" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1111111_+14155186442', {'cf:rv': '0'}"
Function:
def solve(strs, rep): first, sep, rest = strs.partition(',')
lis = first.rsplit('_', 2)
lis[1] = rep
return "_".join(lis) + sep + rest
...
>>> solve(" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}", "1111")
" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1111_+14155186442', {'cf:rv': '0'}"
>>> solve("'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}", "2222")
"'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_2222_14155186442', {'cf:rv': '0'}"
>>> solve("'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}", "2222")
"'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_2222_1371078139195', {'cf:rv': '0'}"
Upvotes: 3
Reputation: 27575
import re
r = re.compile('([^,]*_)(\d+)(?=_[^_,]+,)(_.*)')
for line in ("'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}",
"'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"):
print line
print r.sub('\\1ABCDEFG\\3',line)
print r.sub('\g<1>1234567\\3',line)
result
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
\g<1>
means 'group 1'.
See in the doc:
In addition to character escapes and backreferences as described above, \g will use the substring matched by the group named name, as defined by the (?P...) syntax. \g uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE.
Upvotes: 0