Reputation: 37
I have a very large text file from which I have to extract some data. I read the file line by line and look for keywords. As I know that the keywords I am looking for are much closer to the end of the file than to the beginning. I tried tac keyword set fh [open "|tac filename"] I am getting error as : couldn't execute "tac": no such file or directory
My file size is big so i am not able to store the line in a loop and reverse it again. Please suggest some solution
Upvotes: 0
Views: 1342
Reputation: 16790
tac
is itself a fairly simple program -- you could just implement its algorithm in Tcl, at least if you're determined to literally read each line in reverse order. However, I think that constraint is not really necessary -- you said that the content you're looking for is more likely to be near the end than near the beginning, not that you had to scan the lines in reverse order. That means you can do something a little bit simpler. Roughly speaking:
This way you don't actually have to keep anything more in memory than the single line you're processing right now, and you'll process the data at the end of the file before data earlier in the file. Maybe you could eke out a tiny bit more performance by strictly processing the lines in reverse order but I doubt it will matter compared to the advantage you gain by not scanning from start to finish.
Here's some sample code that implements this algorithm. Note the bit of care taken to avoid processing a partial line:
set BLOCKSIZE 16384
set offset [file size $filename]
set lastOffset [file size $filename]
set f [open $filename r]
while { 1 } {
seek $f $offset
if { $offset > 0 } {
# We may have accidentally read a partial line, because we don't
# know where the line boundaries are. Skip to the end of whatever
# line we're in, and discard the content. We'll get it instead
# at the end of the _next_ block.
gets $f
set offset [tell $f]
}
while { [tell $f] < $lastOffset } {
set line [gets $f]
### Do whatever you're going to do with the line here
puts $line
}
set lastOffset $offset
if { $lastOffset == 0 } {
# All done, we just processed the start of the file.
break
}
set offset [expr {$offset - $BLOCKSIZE}]
if { $offset < 0 } {
set offset 0
}
}
close $f
Upvotes: 3
Reputation: 137567
The cost of reversing a file is actually fairly high. The best option I can think of is to construct a list of file offsets of the starts of lines, and then to use a seek;gets
pattern to go over that list.
set f [open $filename]
# Construct the list of indices
set indices {}
while {![eof $f]} {
lappend indices [tell $f]
gets $f
}
# Iterate backwards
foreach idx [lreverse $indices] {
seek $f $idx
set line [gets $f]
DoStuffWithALine $line
}
close $f
The cost of this approach is non-trivial (even if you happened to have a cache of the indices, you'd still have issues with it) as it doesn't work well with how the OS pre-fetches disk data.
Upvotes: 0