James
James

Reputation: 125

regexp loop to find first instance of each query TCL

I have a list variable containing some values:

lappend list {query1}
             {query2}
             {query3}

And some data in file1 with parts of them matching the values above

query1 first data 
query1 different data
query1 different data
query2 another data  
query2 random data 
query3 data something 
query3 last data 

How do I create a regexp loop that catches only the first instance found of each query and prints them out? In this case the output would be:

query1 first data
query2 another data 
query3 data something

Attempted code to produce the output

set readFile1 [open file1.txt r]
while { [gets $readFile1 data] > -1 } {
for { set n 0 } { $n < [llength $list] } { incr n } {
if { [regexp "[lindex $list $n]" $data] } {
puts $data
}
}
}
close $readFile1

I tried using a for loop while reading the data from a file, but it seems to catch all values even if the -all option is not used.

Upvotes: 2

Views: 851

Answers (4)

Peter Lewerin
Peter Lewerin

Reputation: 13252

package require fileutil

set queries {query1 query2 query3}
set result {}
::fileutil::foreachLine line file1.txt {
    foreach query $queries {
        if {![dict exists $result $query]} {
            if {[regexp $query $line]} {
                dict set result $query $line
                puts $line
            }
        }
    }
}

The trick here is to store the findings in a dictionary. If there is a value corresponding to the query in the dictionary already, we don’t search for it again. This also has the advantage that the found lines are available to the script after the search and aren’t just printed out. The regexp search looks for the query string anywhere in the line: if it should only be in the beginning of the line, use regexp ^$query $line instead.

Documentation: dict, fileutil package, foreach, if, package, puts, regexp, set

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 246807

Not using regexp at all: I assume your "query"s do not contain whitespace

set list [list query1 query2 query3]
array set seen {}
set fh [open file1]
while {[gets $fh line] != -1} {
    set query [lindex [split $line] 0]
    if {$query in $list && $query ni [array names seen]} {
        set seen($query) 1
        puts $line
    }
}
query1 first data 
query2 another data  
query3 data something 

Upvotes: 1

Dinesh
Dinesh

Reputation: 16428

You can either read the file as a whole into a variable using read command, if the text file is smaller in size. Apply the regexp for the content and we can extract the required data.

set list {query1 query2 query3}
set fp [open file1.txt r]
set data [read $fp]
close $fp
foreach elem $list {
    # '-line' flag will enable the line sensitive matching
    if {[regexp -line "$elem.+" $data line]} {
        puts $line
    }
}  

If suppose the file too large to hold or if you consider run-time memory usage, then go ahead with the reading the content line by line. There we need to have control on what already matched for which you can keep an array to maintain whether the first occurrence of any query matched or not.

set list {query1 query2 query3}
set fp [open file1.txt r]
array set first_occurence {}
while {[gets $fp line]!=-1} {
    foreach elem $list {
        if {[info exists first_occurence($elem)]} {
            continue
        }
        if {[regexp $elem $line]} {
            set first_occurence($elem) 1
            puts $line
        }
    }
}
close $fp

Reference : regexp

Upvotes: 2

toxic_boi_8041
toxic_boi_8041

Reputation: 1482

Try This,

set fd [open "query_file.txt" r]
set data [read $fd]
set uniq_list ""
foreach l [split $data "\n"] {
    lappend uniq_list [lindex $l 0]
}

set uniq_list [lsort -unique $uniq_list]

foreach l $uniq_list {
    if {[string equal $l ""]} {
        continue
    }
    foreach line [split $data "\n"] {
        if {[regexp $l $line]} {
            puts "$line"
            break
        }
    }
}

close $fd

References: file , list , regexp

Upvotes: 1

Related Questions