GeoEki
GeoEki

Reputation: 437

Extract last line of very long txt file

I have a very long file containing data ("text.txt") and a single file that contains exactly 1 line that is the last line of text.txt. This single line should be overwritten every 10 minutes (done by a simple chronjob) as text.txt receives another line every 10 minutes.

So based on other code snippets I found on stackoverflow I currently run this code:

#!/usr/bin/env python

import os, sys

file = open(sys.argv[1], "r+")

#Move the pointer (similar to a cursor in a text editor) to the end of the file. 
file.seek(0, os.SEEK_END)

#This code means the following code skips the very last character in the file - 
#i.e. in the case the last line is null we delete the last line 
#and the penultimate one
pos = file.tell() - 1

#Read each character in the file one at a time from the penultimate 
#character going backwards, searching for a newline character
#If we find a new line, exit the search
while pos > 0 and file.read(1) != "\n":
    pos -= 1
    file.seek(pos, os.SEEK_SET)

#So long as we're not at the start of the file, delete all the characters ahead of this position
if pos > 0:
    file.seek(pos, os.SEEK_SET)
    w = open("new.txt",'w')
    file.writelines(pos)
    w.close()

file.close()

With this code I get the error:

TypeError: writelines() requires an iterable argument

(of course). When using file.truncate() I can get rid of the last line in the original file; but I want to keep it there and just extract that last line to new.txt. But I don't comprehend how this works when working with file.seek. So I'd need help for the last part of the code.

file.readlines() with lines[:-1] does not work properly with such huge files.

Upvotes: 1

Views: 761

Answers (4)

thomaskeefe
thomaskeefe

Reputation: 2374

Here's how to tail the last 2 lines of a file into a list:

import subprocess
output = subprocess.check_output(['tail', '-n 2', '~/path/to/my_file.txt'])
lines = output.split('\n')

Now you can get the info you need out of the list lines.

Upvotes: 0

Martin Evans
Martin Evans

Reputation: 46759

How about the following approach:

max_line_length = 1000

with open(sys.argv[1], "r") as f_long, open('new.txt', 'w') as f_new:
    f_long.seek(-max_line_length, os.SEEK_END)
    lines = [line for line in f_long.read().split("\n") if len(line)]
    f_new.write(lines[-1])

This will seek to almost the end of the file and read the remaining part of the file in. It is then split into non-empty lines and the last entry is written to new.txt.

Upvotes: 0

Anand S Kumar
Anand S Kumar

Reputation: 90909

According to your code, pos is an integer which is used to denote the position of first \n from the end of the file.

You cannot do - file.writelines(pos) , as writelines requires a list of lines. But pos is a single integer.

Also you want to write to new.txt , so you should use w file to write, not file . Example -

if pos > 0:
    file.seek(pos, os.SEEK_SET)
    w = open("new.txt",'w')
    w.write(file.read())
    w.close()

Upvotes: 1

Kevin
Kevin

Reputation: 76194

Not sure why you're opening w, only to close it without doing anything with it. If you want new.txt to have all the text from file starting at pos and ending at the end, how about:

if pos > 0:
    file.seek(pos, os.SEEK_SET)
    w = open("new.txt",'w')
    w.write(file.read())
    w.close()

Upvotes: 1

Related Questions