Reputation: 75
I have a large textfile (>200mb) with 70k rows. Now I want to find some special text in each row an place it again at the end of each row (separated with a $-symbol). N++ and Regex works, but is a bit slow, so I want try it with python.
With the code I'll get a "TypeError: must be str, not list" for fn.write(text+run+"\n")
import re
with open('Testfile.txt', mode='r', encoding='utf-8', errors='ignore') as f:
for line in f.readlines():
text = line.replace("\n","$")
run = re.findall(r'Typ: (.*?);', line) or "0"
print(text+run)
with open ("NEWTest.txt", mode="w") as fn:
fn.write(text+run+"\n")
Does anyone know the error and also knows if there is an even faster way (code)?
Upvotes: 1
Views: 75
Reputation: 52832
re.findall
returns a list of strings - i.e. all matches found in the text given to it.
>>> import re
>>> re.findall(r'Typ: (.*?);', 'Typ: foobar;')
['foobar']
To include it again at the end, you can join all the matches together:
fn.write(text + ''.join(run) + "\n")
If you only want one match, you can use re.search
instead:
>>> re.search(r'Typ: (.*?);', 'Typ: foobar;').group(1)
'foobar'
It works with your example:
>>> re.search(r'Typ: (.*?);', 'Typ: Mehrfamilienhaus;').group(1)
'Mehrfamilienhaus'
But since re.search
will give None
if there isn't a match, you can check that you get a proper match before attempting to retrieve the group
(as you do in your findall
above):
result = re.search(r'Typ: (.*?);', 'Typ: Mehrfamilienhaus;')
run = result.group(1) if result else '0'
Upvotes: 1
Reputation: 1639
Try
import re
with open('Testfile.txt', mode='r', encoding='utf-8', errors='ignore') as f:
for line in f.readlines():
text = line.replace("\n","$")
run = re.findall(r'Typ: (.*?);', line) or "0"
with open ("NEWTest.txt", mode="w") as fn:
fn.write(text + ''.join(run) + "\n")
Or
with open('Testfile.txt', mode='r', encoding='utf-8', errors='ignore') as f:
for line in f.readlines():
text = line.replace("\n","$")
run = re.search(r'Typ: (.*?);', line) or "0"
with open ("NEWTest.txt", mode="w") as fn:
fn.write(text + run + "\n")
Upvotes: 2