Reputation: 335
Im trying to delete a specific line (10884121) in a text file that is about 30 million lines long. This is the method I first attempted, however, when I execute it runs for about 20 seconds then gives me a "memory error". Is there a better way to do this? Thanks!
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
with open(f_in, 'r') as fin:
with open(f_out, 'w') as fout:
linenums = [10884121]
s = [y for x, y in enumerate(fin) if x not in [line - 1 for line in linenums]]
fin.seek(0)
fin.write(''.join(s))
fin.truncate(fin.tell())
Upvotes: 1
Views: 2496
Reputation: 77281
How about a generic file filter function?
def file_filter(file_path, condition=None):
"""Yield lines from a file if condition(n, line) is true.
The condition parameter is a callback that receives two
parameters: the line number (first line is 1) and the
line content."""
if condition is None:
condition = lambda n, line: True
with open(file_path) as source:
for n, line in enumerate(source):
if condition(n + 1, line):
yield line
open(f_out, 'w') as destination:
condition = lambda n, line: n != 10884121
for line in file_filter(f_in, condition):
destination.write(line)
Upvotes: 0
Reputation: 1117
There are high chances that you run out of memory since you are trying to store file into list. Try this below:
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
if lineNumber not in linenums:
_fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()
Here we are reading file line by line and excluding lines which are not needed, this may not run out of memory. You can also try reading file using buffering. Hope this helps.
Upvotes: 0
Reputation: 95
Please try to use:
import fileinput
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
f = open(f_out,'w')
counter=0
for line in fileinput.input([f_in]):
counter=counter+1
if counter != 10884121:
f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check
f.close() # you can omit in most cases as the destructor will call it
Upvotes: 0
Reputation: 133929
First of all, you were not using the imports; you were trying to write to the input file, and your code read the whole file into memory.
Something like this might do the trick with less hassle - we read line by line,
use enumerate
to count the line numbers; and for each line we write it to output if its number is not in the list of ignored lines:
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
for lineno, line in enumerate(fin, 1):
if lineno not in ignored_lines:
fout.write(line)
Upvotes: 2