Reputation: 2821
I have a very large text file from which I have to extract some data. I read the file line by line and look for keywords. As I know that the keywords I am looking for are much closer to the end of the file than to the beginning, I wonder if it is possible to read the file starting at the last row instead of the first. I then would use an aditional keyword which indicates "everything beyound this word is not of interesst" and stop reading.
Is that possible ?
Upvotes: 1
Views: 2563
Reputation: 1
to reverse file , I read the file into a variable "list" line by line pre-pending $list with the current line. That way List is in reverse order of file ..
while {[gets $in line] > -1} {
if [regexp "#" $line] {
continue
}
# reverse the order in variable "list"
set list "$line $list"
}
foreach line $list {
puts "line:$ln line= $line"
""*** process each line as you need ***""
}
Upvotes: 0
Reputation: 1
package require struct::list
set fp [open "filename.txt"]
set lines [split [read -nonewline $fp] "\n"]
foreach line [struct::list reverse $lines] {
...
}
do something with "$line"
.
Upvotes: 0
Reputation: 137567
The simplest way to grab the end of a file for searching, assuming you don't know the size of the records (i.e., the line lengths) is to grab too much and work with that.
set f [open $filename]
# Pick some large value; the more you read, the slower
seek $f -100000 end
# Read to the end, split into lines and *DISCARD FIRST*
set lines [lrange [split [read $f] "\n"] 1 end]
Now you can search with lsearch
. (Note that you won't know exactly where in the file your matched line is; if you need that, you have to do quite a lot more work.)
if {[lsearch -glob $lines "*FooBar*"] >= 0} {
...
}
The discarding of the first line from the read section is because you're probably starting reading half way through a line; dropping the first “line” will mean that you've only got genuine lines to deal with. (100kB isn't very much for any modern computer system to search through, but you may be able to constrain it further. It depends on the details of the data.)
Upvotes: 3
Reputation: 246764
I don't know how performant this would be, but run the file through tac
and read from that:
set fh [open "|tac filename"]
# read from last line to first
while {[gets $fh line] != -1} {...
Another tactic would be to read the last, say, 5000 bytes of the file (using seek
), split on newlines and examine those lines, then seek to position 10000 from the end and read the "next" 5000 bytes, etc.
Upvotes: 4
Reputation: 55443
No it is not possible (in any runtime/language I'm aware of, Tcl included).
So decide on a buffer side and read your file by seeking backwards and trying to read a full buffer each time.
Note that you have to observe certain possibilities:
It seems you're dealing with a text file, and you want to process it line-wise. If so, observe that if the code is cross-platform or has to work on Windows you have to deal with the case when the data placed in the buffer by the last read operation starts with LF, and the next read operation—of the preceding chunk—will end with CR—that is, your EOL marker will be split across the buffers.
You might want to take a look at the implementation of Tcl_GetsObj()
in the generic/tclIO.c
file in the Tcl source code—it deals with split CRLFs on normal ("forward") reading of a textual string from a file.
Upvotes: 3