Reputation: 21
I have a file with over 40k lines inside which I need to replace words, lines using regex. I cannot make it work on my own. Let's say file looks like:
test >
test >
test >
test >
def start():
file = input("file: ")
fread = open(file, "r")
linelist = fread.readlines()
fread.close()
fwrite = open(file, "w")
line = re.sub(".*(?=>)", " ", str(linelist))
fwrite.write(line)
fwrite.close()
start()
But instead of removing test and giving me:
>
>
>
It gives me
>\n']
and no other lines
Upvotes: 0
Views: 337
Reputation: 42087
linelist
is a list
, you're converting that to string, causing re.sub
to remove everything before last >\n]
. When you run str
on a list you get e.g:
In [1]: str([1, 2])
Out[1]: '[1, 2]'
That is the main issue. What you need is to iterate over the list and do operation on each line separately and save the modified line.
But there is a better was than doing spiltlines
on the file object to get all the lines on a single list; as your file is large, this would incur huge memory cost.
Also I would suggest you to use separate files for reading and writing as you're doing operation on each line of the file and then writing back to the same file. If you must use the same file, I would use separate files and then replace (shutil.move
) the reading one with the writing one once the operations are done.
So overall, a better approach would be to iterate over the file object (as it is an iterator) and do operation:
with open('input_file') as in_file, open('output_file', 'w') as out_file:
for line in in_file:
modified_line = re.sub(r'^.*(?=>)', ' ', line)
out_file.write(modified_line)
open
is a context manager so you can use with
statememnt on it, it has additional benifit of calling close
on the file object, so you don't need to close them manually.
If your pattern is exactly like the way shown in the example, you can use str.replace
, no need for Regex:
modified_line = line.replace('test ', ' ')
Upvotes: 2
Reputation: 407
Test your regular expression in a "python online regexp tester". RegExp are easy to get wrong. This will tell you if you have it right.
If you just need the output, as opposed to a python script, try notepad++. It supports regexp and can handle 40,000 lines. Many other editors do too. Don't code unless you have to.
Upvotes: 0