Emiliano Spada
Emiliano Spada

Reputation: 47

Python - How to replace all occurrences of a substring with consecutive number and save changes to main string?

I have a string 'companydocuments' inside a txt file.
I need to count all occurrences of given string and replace them with its corresponding consecutive number
e.g. 'companydocuments' was found 405 times so each string has to be 'companydocuments1', 'companydocuments2', so on till the last one (405) and save changes to file.
The aim is to use those strings as references further in the code to make or not certain operations.
My code does not work well but it changes all occurrences always with the last number
e.g. 'companydocuments405' for each record and it does not save anything to file.

#!/usr/bin/python
#Python 2.7.12

import re, os, string
with open('1.txt', 'r') as myfile:  
   lenght = myfile.read()
   a = lenght.count('COMPANYDOCUMENTS')
   a2 = re.findall('COMPANYDOCUMENTS', lenght)
   for i in range(a):
     string = 'COMPANYDOCUMENTS'
     b = [string + str(i) for i in range(a)]
     a2 = b[:]
     a3 = str(a2)
   content1 = lenght.replace('COMPANYDOCUMENTS', a3)
   myfile = open('1.txt', 'w')
   myfile.write(content1)
   myfile.close()

Upvotes: 2

Views: 353

Answers (3)

Bence
Bence

Reputation: 135

Not the most efficient way but works:

import string

readen = "sometext companydocument sometext companydocument ..."
delimiter = "companydocument"

result = ""
index = 0; # index will stay after every found of the delimiter

for i in readen.split(delimiter):
    index += 1
    # add the intermediate text (i), delimiter and index to the result
    result += i + delimiter + str(index)

# after the last item of the splitted list is the delimiter with an index not needed
# so remove it
result = result[ 0: -( len(str(index))  + len(delimiter) ) ]

# now is "sometext companydocument1 sometext companydocument2 ..." stored in result

Upvotes: 0

adrtam
adrtam

Reputation: 7241

There is a simpler way to do this. First, let me go with a string:

>>> a = "ABCHCYEQCUWC"
>>> import re
>>> re.split('(C)', a)
['AB', 'C', 'H', 'C', 'YEQ', 'C', 'UW', 'C', '']

The re module has a split() function that is similar to string split(), except that if you put the regex in parentheses, you keep the separator. So I leverage this feature to produce a list of tokens, such that every other token is the string you're interested (yours is "COMPANYDOCUMENTS", mine is "C"). Now save it into a list:

>>> tokens = re.split('(C)', a)
>>> tokens[1::2]
['C', 'C', 'C', 'C']

So we want to modify this separators by appending a sequence number, which is easy in Python with enumerate() and list comprehension:

>>> [x+str(i+1) for i,x in enumerate(tokens[1::2])]
['C1', 'C2', 'C3', 'C4']

And now you can replace your tokenized string and rebuild the output string:

>>> tokens[1::2] = [x+str(i+1) for i,x in enumerate(tokens[1::2])]
>>> tokens
['AB', 'C1', 'H', 'C2', 'YEQ', 'C3', 'UW', 'C4', '']
>>> "".join(tokens)
'ABC1HC2YEQC3UWC4'

Upvotes: 0

blhsing
blhsing

Reputation: 107124

You can use re.sub with a replacement function that concatenates the match with a counter (using itertools.count):

from itertools import count
import re
lenght = 'abc companydocuments xyz companydocuments def companydocuments 123'
c = count(1)
print(re.sub('companydocuments', lambda m: m.group() + str(next(c)), lenght))

This outputs:

abc companydocuments1 xyz companydocuments2 def companydocuments3 123

Upvotes: 1

Related Questions