user7145588
user7145588

Reputation: 37

How to read file from end to start (in reverse order) in TCL?

I have a very large text file from which I have to extract some data. I read the file line by line and look for keywords. As I know that the keywords I am looking for are much closer to the end of the file than to the beginning. I tried tac keyword set fh [open "|tac filename"] I am getting error as : couldn't execute "tac": no such file or directory

My file size is big so i am not able to store the line in a loop and reverse it again. Please suggest some solution

Upvotes: 0

Views: 1342

Answers (2)

Eric Melski
Eric Melski

Reputation: 16790

tac is itself a fairly simple program -- you could just implement its algorithm in Tcl, at least if you're determined to literally read each line in reverse order. However, I think that constraint is not really necessary -- you said that the content you're looking for is more likely to be near the end than near the beginning, not that you had to scan the lines in reverse order. That means you can do something a little bit simpler. Roughly speaking:

  1. Seek to an offset near the end of the file.
  2. Read line-by-line as normal, until you hit data you've already processed.
  3. Seek to an offset a bit further back from the end of the file.
  4. Read line-by-line as normal, until you hit data you've already processed.
  5. etc.

This way you don't actually have to keep anything more in memory than the single line you're processing right now, and you'll process the data at the end of the file before data earlier in the file. Maybe you could eke out a tiny bit more performance by strictly processing the lines in reverse order but I doubt it will matter compared to the advantage you gain by not scanning from start to finish.

Here's some sample code that implements this algorithm. Note the bit of care taken to avoid processing a partial line:

set BLOCKSIZE  16384
set offset     [file size $filename]
set lastOffset [file size $filename]

set f [open $filename r]
while { 1 } {
    seek $f $offset

    if { $offset > 0 } {
        # We may have accidentally read a partial line, because we don't
        # know where the line boundaries are.  Skip to the end of whatever
        # line we're in, and discard the content.  We'll get it instead
        # at the end of the _next_ block.

        gets $f
        set offset [tell $f]
    }

    while { [tell $f] < $lastOffset } {
        set line [gets $f]

        ### Do whatever you're going to do with the line here

        puts $line
    }

    set lastOffset $offset
    if { $lastOffset == 0 } {
        # All done, we just processed the start of the file.

        break
    }

    set offset [expr {$offset - $BLOCKSIZE}]
    if { $offset < 0 } {
        set offset 0
    }
}
close $f

Upvotes: 3

Donal Fellows
Donal Fellows

Reputation: 137567

The cost of reversing a file is actually fairly high. The best option I can think of is to construct a list of file offsets of the starts of lines, and then to use a seek;gets pattern to go over that list.

set f [open $filename]

# Construct the list of indices
set indices {}
while {![eof $f]} {
    lappend indices [tell $f]
    gets $f
}

# Iterate backwards
foreach idx [lreverse $indices] {
    seek $f $idx
    set line [gets $f]

    DoStuffWithALine $line
}

close $f

The cost of this approach is non-trivial (even if you happened to have a cache of the indices, you'd still have issues with it) as it doesn't work well with how the OS pre-fetches disk data.

Upvotes: 0

Related Questions