Bill Moore
Bill Moore

Reputation: 165

Non-Greedy Capture between parens in TCL: #\( .*?\)

I'm trying to use ActiveState TCL on a windows PC to run the following TCL. looks like i'm doing a non-greedy match between #\(.*?\) and its matching greedy into the next statements... Any idea what i'm doing wrong or how to fix this?


proc extract_verilog_instances {text} {

    set rexp {(\w+)\s+(\#\s*\((?:.*?)\)\s*)?(\w+(?:\[\d+\])?)\s*\(}

    # rexp will match any of the following statement types:
    #
    #   module_name instance_name ( 
    #   module_name instance_name[0] (
    #   module_name #(parameter1, parameter2) instance_name (
    #   module_name #(parameter1, parameter2) instance_name[0] (


    set regrun [regexp -inline -all -indices -expanded $rexp $text]

    foreach {m0 m1 m2 m3} $regrun {
        set start_index    [lindex $m0 0]
        set end_index      [lindex $m0 1]
        set module   [string range $text [lindex $m1 0] [lindex $m1 1]]
        set instance [string range $text [lindex $m3 0] [lindex $m3 1]]

       puts "module:$module instance:$instance"
    }
}

set vlog {
    
    second_module #(2) inst2 (.in2(sig2), .out2(sig3));

    third_module inst3 (.in3(sig3), .out3(sig4));

    fourth_module #(.in4_clk_freq(50), .in4_rst_val(1'b0)) inst4 (.in4_clk(clk), .in4_rst(rst), .in4_in1(sig4), .in4_in2(sig5), .out4(sig6));
}

extract_verilog_instances $vlog

proc extract_verilog_instances5 $vlog

Expected output:

module:second_module instance:inst2
module:third_module instance:inst3
module:forth_module instance:inst4

Actual output:

module:second_module instance:inst4

Upvotes: 2

Views: 59

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627341

You can use

(\w+?)\s+(#\s*\(.*\)\s*)?(\w+(?:\[\d+\])?)\s*\(

In a Tcl regex, greediness is set with the first quantifier in the pattern. So, if you use \w+? as the first quantified subpattern, all subsequent patterns with + or * will automatically turn into +? and *?.

If you want to test this regex in a PCRE compliant regex tester, the pattern above should be written as

(\w+?)\s+?(#\s*?\(.*?\)\s*?)?(\w+?(?:\[\d+?\])??)\s*?\(

See the regex demo.

This regex works for you because \w+? at the start of the pattern will work the same as \w+ because it is followed with an obligatory \s, and all the rest lazy patterns work because of the obligatory patterns following them (\( is very good and important here).

Upvotes: 2

Related Questions