Reputation: 1716
I am extracting numerical data of voltages from a file name. The name contains three such data but regexp is only returning 2.
set data "blabla_0p500v_0p530v_0p550v_m25c_foo.dat"
regexp -all -inline {_(\dp\d{3})v_} $data
Returns:
_0p500v_ 0p500 _0p550v_ 0p550
I was expecting :
_0p500v_ 0p500 _0p530v_ 0p530 _0p550v_ 0p550
Not sure what's missing.
Thanks for your help.
Upvotes: 0
Views: 77
Reputation: 48711
Use a positive lookahead:
_(\dp\d{3})v(?=_)
This way following underscore is not consumed and is ready to be matched by next iteration.
To append _
in matched part:
set output [regexp -all -inline {_(\dp\d{3})v(?=_)} $data]
set index 0
foreach item $output {
puts [expr {$index % 2 == 0 ? "$item\_": $item}]
incr index
}
Upvotes: 2
Reputation: 626738
You may use your pattern, but iterate over the string searching for all occurrences of the first char, _
(note it can be done with a regex using -indices
option if the first char is not "hardcoded", but here you may use a mere string first
), and check for a regex match at each of those locations. If a match is found, lappend
the match and the first capture into the list.
See the Tcl code demo:
set data "blabla_0p500v_0p530v_0p550v_m25c_foo.dat"
set RE {_(\dp\d{3}v)_}
set result []
set idx [string first "_" $data 0]
while {$idx > -1} {
if {[regexp -start $idx $RE $data whole between]==1} {
lappend result $whole $between
}
set idx [string first "_" $data $idx+1]
}
puts $result
Output:
_0p500v_ 0p500v _0p530v_ 0p530v _0p550v_ 0p550v
Note you may use @revo's approach, but you will have to reconstruct the output by examining all the items in the resulting list and appending _
to those items that start with _
:
set data "blabla_0p500v_0p530v_0p550v_m25c_foo.dat"
set RE {_(\dp\d{3}v)(?=_)}
set ms [regexp -all -inline $RE $data]
set result []
foreach m $ms {
if {[string index $m 0] == "_"} {
lappend result "${m}_"
} else {
lappend result $m
}
}
puts $result
Just to clarify what "does not consume" here means: (?=_)
, a non-cosuming pattern, does not put the _
into the regex match value, and the regex index stay right before _
when the lookahead pattern is executed. Thus, the next match can start right before this _
.
Upvotes: 0