Juli Goe
Juli Goe

Reputation: 75

Edit textfile with python re

I have a large textfile (>200mb) with 70k rows. Now I want to find some special text in each row an place it again at the end of each row (separated with a $-symbol). N++ and Regex works, but is a bit slow, so I want try it with python.

With the code I'll get a "TypeError: must be str, not list" for fn.write(text+run+"\n")

import re

with open('Testfile.txt', mode='r', encoding='utf-8', errors='ignore') as f:
    for line in f.readlines():
        text = line.replace("\n","$")
        run = re.findall(r'Typ: (.*?);', line) or "0"
        print(text+run)

        with open ("NEWTest.txt", mode="w") as fn:
            fn.write(text+run+"\n")

Does anyone know the error and also knows if there is an even faster way (code)?

Upvotes: 1

Views: 75

Answers (2)

MatsLindh
MatsLindh

Reputation: 52832

re.findall returns a list of strings - i.e. all matches found in the text given to it.

>>> import re
>>> re.findall(r'Typ: (.*?);', 'Typ: foobar;')
['foobar']

To include it again at the end, you can join all the matches together:

 fn.write(text + ''.join(run) + "\n")

If you only want one match, you can use re.search instead:

>>> re.search(r'Typ: (.*?);', 'Typ: foobar;').group(1)
'foobar'

It works with your example:

>>> re.search(r'Typ: (.*?);', 'Typ: Mehrfamilienhaus;').group(1)
'Mehrfamilienhaus'

But since re.search will give None if there isn't a match, you can check that you get a proper match before attempting to retrieve the group (as you do in your findall above):

result = re.search(r'Typ: (.*?);', 'Typ: Mehrfamilienhaus;')
run = result.group(1) if result else '0'

Upvotes: 1

Hayat
Hayat

Reputation: 1639

Try

import re

with open('Testfile.txt', mode='r', encoding='utf-8', errors='ignore') as f:
    for line in f.readlines():
        text = line.replace("\n","$")
        run = re.findall(r'Typ: (.*?);', line) or "0"


        with open ("NEWTest.txt", mode="w") as fn:
            fn.write(text + ''.join(run) + "\n")

Or

with open('Testfile.txt', mode='r', encoding='utf-8', errors='ignore') as f:
    for line in f.readlines():
        text = line.replace("\n","$")
        run = re.search(r'Typ: (.*?);', line) or "0"


        with open ("NEWTest.txt", mode="w") as fn:
            fn.write(text + run + "\n")

Upvotes: 2

Related Questions