Gert Gottschalk
Gert Gottschalk

Reputation: 1716

TCL regexp not returning expected matches

I am extracting numerical data of voltages from a file name. The name contains three such data but regexp is only returning 2.

set data "blabla_0p500v_0p530v_0p550v_m25c_foo.dat"
regexp -all -inline {_(\dp\d{3})v_} $data

Returns:

_0p500v_ 0p500 _0p550v_ 0p550

I was expecting :

_0p500v_ 0p500 _0p530v_ 0p530 _0p550v_ 0p550

Not sure what's missing.

Thanks for your help.

Upvotes: 0

Views: 77

Answers (2)

revo
revo

Reputation: 48711

Use a positive lookahead:

_(\dp\d{3})v(?=_)

This way following underscore is not consumed and is ready to be matched by next iteration.

To append _ in matched part:

set output [regexp -all -inline {_(\dp\d{3})v(?=_)} $data]
set index 0
foreach item $output {
  puts [expr {$index % 2 == 0 ? "$item\_": $item}]
  incr index
}

Live demo

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You may use your pattern, but iterate over the string searching for all occurrences of the first char, _ (note it can be done with a regex using -indices option if the first char is not "hardcoded", but here you may use a mere string first), and check for a regex match at each of those locations. If a match is found, lappend the match and the first capture into the list.

See the Tcl code demo:

set data "blabla_0p500v_0p530v_0p550v_m25c_foo.dat"
set RE {_(\dp\d{3}v)_}
set result []
set idx [string first "_" $data 0]
while {$idx > -1} {
    if {[regexp -start $idx $RE $data whole between]==1} {
        lappend result $whole $between
    }
    set idx [string first "_" $data $idx+1]
}
puts $result

Output:

_0p500v_ 0p500v _0p530v_ 0p530v _0p550v_ 0p550v

Note you may use @revo's approach, but you will have to reconstruct the output by examining all the items in the resulting list and appending _ to those items that start with _:

set data "blabla_0p500v_0p530v_0p550v_m25c_foo.dat"
set RE {_(\dp\d{3}v)(?=_)}
set ms [regexp -all -inline $RE $data]
set result []
foreach m $ms {
    if {[string index $m 0] == "_"} {
        lappend result "${m}_"
    } else {
        lappend result $m
    }
}
puts $result

See another Tcl demo online.

Just to clarify what "does not consume" here means: (?=_), a non-cosuming pattern, does not put the _ into the regex match value, and the regex index stay right before _ when the lookahead pattern is executed. Thus, the next match can start right before this _.

Upvotes: 0

Related Questions