zer0stimulus
zer0stimulus

Reputation: 23616

Python: How to replace N random string occurrences in text?

Say that I have 10 different tokens, "(TOKEN)" in a string. How do I replace 2 of those tokens, chosen at random, with some other string, leaving the other tokens intact?

Upvotes: 4

Views: 1205

Answers (7)

jamylak
jamylak

Reputation: 133574

>>> import random
>>> text = '(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)'
>>> token = '(TOKEN)'
>>> replace = 'foo'
>>> num_replacements = 2
>>> num_tokens = text.count(token) #10 in this case
>>> points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
>>> replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
'(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__foo__(TOKEN)__foo__(TOKEN)__(TOKEN)__(TOKEN)'

In function form:

>>> def random_replace(text, token, replace, num_replacements):
        num_tokens = text.count(token)
        points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
        return replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))

>>> random_replace('....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....','(TOKEN)','FOO',2)
'....FOO....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....FOO....'

Test:

>>> for i in range(0,9):
        print random_replace('....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....','(0)','(%d)'%i,i)


....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(1)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(0)....(2)....(2)....(0)....
....(3)....(0)....(0)....(3)....(0)....(3)....(0)....(0)....
....(4)....(4)....(0)....(0)....(4)....(4)....(0)....(0)....
....(0)....(5)....(5)....(5)....(5)....(0)....(0)....(5)....
....(6)....(6)....(6)....(0)....(6)....(0)....(6)....(6)....
....(7)....(7)....(7)....(7)....(7)....(7)....(0)....(7)....
....(8)....(8)....(8)....(8)....(8)....(8)....(8)....(8)....

Upvotes: 2

ChessMaster
ChessMaster

Reputation: 549

from random import sample

mystr = 'adad(TOKEN)hgfh(TOKEN)hjgjh(TOKEN)kjhk(TOKEN)jkhjk(TOKEN)utuy(TOKEN)tyuu(TOKEN)tyuy(TOKEN)tyuy(TOKEN)tyuy(TOKEN)'

def replace(mystr, substr, n_repl, replacement='XXXXXXX', tokens=10, index=0):
    choices = sorted(sample(xrange(tokens),n_repl))
    for i in xrange(choices[-1]+1):
        index = mystr.index(substr, index) + 1
        if i in choices:
            mystr = mystr[:index-1] + mystr[index-1:].replace(substr,replacement,1)
    return mystr

print replace(mystr,'(TOKEN)',2)

Upvotes: 0

rob mayoff
rob mayoff

Reputation: 385670

There are lots of ways to do this. My approach would be to write a function that takes the original string, the token string, and a function that returns the replacement text for an occurrence of the token in the original:

def strByReplacingTokensUsingFunction(original, token, function):
    outputComponents = []
    matchNumber = 0
    unexaminedOffset = 0
    while True:
        matchOffset = original.find(token, unexaminedOffset)
        if matchOffset < 0:
            matchOffset = len(original)
        outputComponents.append(original[unexaminedOffset:matchOffset])
        if matchOffset == len(original):
            break
        unexaminedOffset = matchOffset + len(token)
        replacement = function(original=original, offset=matchOffset, matchNumber=matchNumber, token=token)
        outputComponents.append(replacement)
        matchNumber += 1
    return ''.join(outputComponents)

(You could certainly change this to use shorter identifiers. My style is somewhat more verbose than typical Python style.)

Given that function, it's easy to replace two random occurrences out of ten. Here's some sample input:

sampleInput = 'a(TOKEN)b(TOKEN)c(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)g(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k'

The random module has a handy method for picking random items from a population (not picking the same item twice):

import random
replacementIndexes = random.sample(range(10), 2)

Then we can use the function above to replace the randomly-chosen occurrences:

sampleOutput = strByReplacingTokensUsingFunction(sampleInput, '(TOKEN)',
    (lambda matchNumber, token, **keywords:
        'REPLACEMENT' if (matchNumber in replacementIndexes) else token))
print sampleOutput

And here's some test output:

a(TOKEN)b(TOKEN)cREPLACEMENTd(TOKEN)e(TOKEN)fREPLACEMENTg(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k

Here's another run:

a(TOKEN)bREPLACEMENTc(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)gREPLACEMENTh(TOKEN)i(TOKEN)j(TOKEN)k

Upvotes: 1

&#211;scar L&#243;pez
&#211;scar L&#243;pez

Reputation: 236014

Try this solution:

import random

def replace_random(tokens, eqv, n):
    random_tokens = eqv.keys()
    random.shuffle(random_tokens)
    for i in xrange(n):
        t = random_tokens[i]
        tokens = tokens.replace(t, eqv[t])
    return tokens

Assuming that a string with tokens exists, and a suitable equivalence table can be constructed with a replacement for each token:

tokens = '(TOKEN1) (TOKEN2) (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) (TOKEN9) (TOKEN10)'

equivalences = {
    '(TOKEN1)' : 'REPLACEMENT1',
    '(TOKEN2)' : 'REPLACEMENT2',
    '(TOKEN3)' : 'REPLACEMENT3',
    '(TOKEN4)' : 'REPLACEMENT4',
    '(TOKEN5)' : 'REPLACEMENT5',
    '(TOKEN6)' : 'REPLACEMENT6',
    '(TOKEN7)' : 'REPLACEMENT7',
    '(TOKEN8)' : 'REPLACEMENT8',
    '(TOKEN9)' : 'REPLACEMENT9',
    '(TOKEN10)' : 'REPLACEMENT10'
}

You can call it like this:

replace_random(tokens, equivalences, 2)
> '(TOKEN1) REPLACEMENT2 (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) REPLACEMENT9 (TOKEN10)'

Upvotes: 1

Alexander Putilin
Alexander Putilin

Reputation: 2342

My solution in code:

import random

s = "(TOKEN)test(TOKEN)fgsfds(TOKEN)qwerty(TOKEN)42(TOKEN)(TOKEN)ttt"
replace_from = "(TOKEN)"
replace_to = "[REPLACED]"
amount_to_replace = 2

def random_replace(s, replace_from, replace_to, amount_to_replace):
    parts = s.split(replace_from)
    indices = random.sample(xrange(len(parts) - 1), amount_to_replace)

    replaced_s_parts = list()

    for i in xrange(len(parts)):
        replaced_s_parts.append(parts[i])
        if i < len(parts) - 1:
            if i in indices:
                replaced_s_parts.append(replace_to)
            else:
                replaced_s_parts.append(replace_from)

    return "".join(replaced_s_parts)

#TEST

for i in xrange(5):
    print random_replace(s, replace_from, replace_to, 2)

Explanation:

  1. Splits string into several parts using replace_from
  2. Chooses indexes of tokens to replace using random.sample. This returned list contains unique numbers
  3. Build a list for string reconstruction, replacing tokens with generated index by replace_to.
  4. Concatenate all list elements into single string

Upvotes: 1

Garen
Garen

Reputation: 961

What are you trying to do, exactly? A good answer will depend on that...

That said, a brute-force solution that comes to mind is to:

  1. Store the 10 tokens in an array, such that tokens[0] is the first token, tokens[1] is the second, ... and so on
  2. Create a dictionary to associate each unique "(TOKEN)" with two numbers: start_idx, end_idx
  3. Write a little parser that walks through your string and looks for each of the 10 tokens. Whenever one is found, record the start/end indexes (as start_idx, end_idx) in the string where that token occurs.
  4. Once done parsing, generate a random number in the range [0,9]. Lets call this R
  5. Now, your random "(TOKEN)" is tokens[R];
  6. Use the dictionary in step (3) to find the start_idx, end_idx values in the string; replace the text there with "some other string"

Upvotes: 1

Eli Bendersky
Eli Bendersky

Reputation: 273526

If you need exactly two, then:

  1. Detect the tokens (keep some links to them, like index into the string)
  2. Choose two at random (random.choice)
  3. Replace them

Upvotes: 1

Related Questions