Reputation: 47
I have a string 'companydocuments' inside a txt file.
I need to count all occurrences of given string and replace them with its corresponding consecutive number
e.g. 'companydocuments' was found 405 times so each string has to be 'companydocuments1', 'companydocuments2', so on till the last one (405) and save changes to file.
The aim is to use those strings as references further in the code to make or not certain operations.
My code does not work well but it changes all occurrences always with the last number
e.g. 'companydocuments405' for each record and it does not save anything to file.
#!/usr/bin/python
#Python 2.7.12
import re, os, string
with open('1.txt', 'r') as myfile:
lenght = myfile.read()
a = lenght.count('COMPANYDOCUMENTS')
a2 = re.findall('COMPANYDOCUMENTS', lenght)
for i in range(a):
string = 'COMPANYDOCUMENTS'
b = [string + str(i) for i in range(a)]
a2 = b[:]
a3 = str(a2)
content1 = lenght.replace('COMPANYDOCUMENTS', a3)
myfile = open('1.txt', 'w')
myfile.write(content1)
myfile.close()
Upvotes: 2
Views: 353
Reputation: 135
Not the most efficient way but works:
import string
readen = "sometext companydocument sometext companydocument ..."
delimiter = "companydocument"
result = ""
index = 0; # index will stay after every found of the delimiter
for i in readen.split(delimiter):
index += 1
# add the intermediate text (i), delimiter and index to the result
result += i + delimiter + str(index)
# after the last item of the splitted list is the delimiter with an index not needed
# so remove it
result = result[ 0: -( len(str(index)) + len(delimiter) ) ]
# now is "sometext companydocument1 sometext companydocument2 ..." stored in result
Upvotes: 0
Reputation: 7241
There is a simpler way to do this. First, let me go with a string:
>>> a = "ABCHCYEQCUWC"
>>> import re
>>> re.split('(C)', a)
['AB', 'C', 'H', 'C', 'YEQ', 'C', 'UW', 'C', '']
The re
module has a split()
function that is similar to string split()
, except that if you put the regex in parentheses, you keep the separator. So I leverage this feature to produce a list of tokens, such that every other token is the string you're interested (yours is "COMPANYDOCUMENTS", mine is "C"). Now save it into a list:
>>> tokens = re.split('(C)', a)
>>> tokens[1::2]
['C', 'C', 'C', 'C']
So we want to modify this separators by appending a sequence number, which is easy in Python with enumerate()
and list comprehension:
>>> [x+str(i+1) for i,x in enumerate(tokens[1::2])]
['C1', 'C2', 'C3', 'C4']
And now you can replace your tokenized string and rebuild the output string:
>>> tokens[1::2] = [x+str(i+1) for i,x in enumerate(tokens[1::2])]
>>> tokens
['AB', 'C1', 'H', 'C2', 'YEQ', 'C3', 'UW', 'C4', '']
>>> "".join(tokens)
'ABC1HC2YEQC3UWC4'
Upvotes: 0
Reputation: 107124
You can use re.sub
with a replacement function that concatenates the match with a counter (using itertools.count
):
from itertools import count
import re
lenght = 'abc companydocuments xyz companydocuments def companydocuments 123'
c = count(1)
print(re.sub('companydocuments', lambda m: m.group() + str(next(c)), lenght))
This outputs:
abc companydocuments1 xyz companydocuments2 def companydocuments3 123
Upvotes: 1