lsch91
lsch91

Reputation: 335

Python Delete a specific Line number

Im trying to delete a specific line (10884121) in a text file that is about 30 million lines long. This is the method I first attempted, however, when I execute it runs for about 20 seconds then gives me a "memory error". Is there a better way to do this? Thanks!

import fileinput
import sys

f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'

with open(f_in, 'r') as fin:
    with open(f_out, 'w') as fout:
        linenums = [10884121]
        s = [y for x, y in enumerate(fin) if x not in [line - 1 for line in linenums]]
        fin.seek(0)
        fin.write(''.join(s))
        fin.truncate(fin.tell())

Upvotes: 1

Views: 2496

Answers (4)

Paulo Scardine
Paulo Scardine

Reputation: 77281

How about a generic file filter function?

def file_filter(file_path, condition=None):
    """Yield lines from a file if condition(n, line) is true.
       The condition parameter is a callback that receives two
       parameters: the line number (first line is 1) and the 
       line content."""

    if condition is None:
        condition = lambda n, line: True

    with open(file_path) as source:
        for n, line in enumerate(source):
            if condition(n + 1, line):
                yield line

open(f_out, 'w') as destination:
    condition = lambda n, line: n != 10884121

    for line in file_filter(f_in, condition):
        destination.write(line)

Upvotes: 0

dheeraj .A
dheeraj .A

Reputation: 1117

There are high chances that you run out of memory since you are trying to store file into list. Try this below:

import fileinput
import sys

f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
    if lineNumber not in linenums:
        _fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()

Here we are reading file line by line and excluding lines which are not needed, this may not run out of memory. You can also try reading file using buffering. Hope this helps.

Upvotes: 0

Alex
Alex

Reputation: 95

Please try to use:

import fileinput

f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'

f = open(f_out,'w')

counter=0

for line in fileinput.input([f_in]):
    counter=counter+1
    if counter != 10884121:
          f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check

f.close() # you can omit in most cases as the destructor will call it

Upvotes: 0

First of all, you were not using the imports; you were trying to write to the input file, and your code read the whole file into memory.

Something like this might do the trick with less hassle - we read line by line, use enumerate to count the line numbers; and for each line we write it to output if its number is not in the list of ignored lines:

f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'

ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
    for lineno, line in enumerate(fin, 1):
        if lineno not in ignored_lines:
            fout.write(line)

Upvotes: 2

Related Questions