Nathan
Nathan

Reputation: 167

python iterate over a file and replace strings

I'm using the 're' library to replace occurrences of different strings in multiple files. The replacement pattern works fine, but I'm not able to maintain the changes to the files. I'm trying to get the same functionality that comes with the following lines:

    with open(KEY_FILE, mode='r', encoding='utf-8-sig') as f:
        replacements = csv.DictReader(f)
        user_data = open(temp_file, 'r').read()

        for col in replacements:
            user_data = user_data.replace(col[ORIGINAL_COLUMN], col[TARGET_COLUMN])

        data_output = open(f"{temp_file}", 'w')
        data_output.write(user_data)
        data_output.close()

The key line here is:

user_data = user_data.replace(col[ORIGINAL_COLUMN], col[TARGET_COLUMN])

It takes care of updating the data in place using the replace method.

I need to do the same but with the 're' library:

    with open(KEY_FILE, mode='r', encoding='utf-8-sig') as f:
        replacements = csv.DictReader(f)
        user_data = open(temp_file, 'r').read()
        a = open(f"{test_file}", 'w')

        for col in replacements:
            original_str = col[ORIGINAL_COLUMN]
            target_str = col[TARGET_COLUMN]
            compiled = re.compile(re.escape(original_str), re.IGNORECASE)
            result = compiled.sub(target_str, user_data)
            a.write(result)

I only end up with the last item in the .csv dict changed in the output file. Can't seem to get the changes made in previous iterations of the for loop to persist.

I know that it is pulling from the same file each time... which is why it is getting reset each loop, but I can't sort out a workaround.

Thanks

Upvotes: 0

Views: 544

Answers (1)

jwd
jwd

Reputation: 11114

Try something like this?

#!/usr/bin/env python3

import csv
import re
import sys
from io import StringIO

KEY_FILE = '''aaa,bbb
xxx,yyy
'''
TEMP_FILE = '''here is aaa some text xxx
bla bla aaaxxx
'''
ORIGINAL_COLUMN = 'FROM'
TARGET_COLUMN = 'TO'

user_data = StringIO(TEMP_FILE).read()

with StringIO(KEY_FILE) as f:
    reader = csv.DictReader(f, ['FROM','TO'])
    for row in reader:
        original_str = row[ORIGINAL_COLUMN]
        target_str = row[TARGET_COLUMN]
        compiled = re.compile(re.escape(original_str), re.IGNORECASE)
        user_data = compiled.sub(target_str, user_data)

sys.stdout.write("modified user_data:\n" + user_data)

Some things to note:

  • The main problem was result = sub(..., user_data) rather than result = sub(..., result). You want to keep updating the same string, rather than always applying to the original.
  • The compiling of regex is fairly pointless in this case, since each is just used once.
  • I don't have access to your test files, so I used StringIO versions inline and printing to stdout; hopefully that's easy enough to translate back to your real code (:
    • In future posts, you might consider doing similar, so that your question has 100% runnable code someone else can try out without guessing.

Upvotes: 1

Related Questions