user3512999
user3512999

Reputation: 149

Tcl regexp: extraction of all elements

I have simple strings like below:

set x "\ \ a\ b\ \ a\ b\ b\ a\ \ \ "  

I am trying to extract all occurrences of "a" and "b" by using the following regexp:

set match [regexp -all -inline {(\S+)} $x]

But that gives me:

a a b b a a b b b b a a

I was expecting:

a b a b b a

What am I doing wrong?

Thanks.

Upvotes: 0

Views: 819

Answers (1)

Donal Fellows
Donal Fellows

Reputation: 137567

The -all -inline option combination makes regexp return a list of all the matches and capturing submatches that it finds, and your regular expression includes a capturing submatch that happens to be the same as the entire match.

Try this:

set match [regexp -all -inline {\S+} $x]

If you need non-capturing parentheses, use (?:…) instead of (…).

If you have to have capturing groups because you're matching something more complex, you can filter the result with lmap (8.6 or later) or foreach:

set match [lmap {matched ignored} [regexp -all -inline {(\S+)} $x] {
    set matched
}]
set match {}
foreach {matched ignored} [regexp -all -inline {(\S+)} $x] {
    lappend match $matched
}

Note that we're using two iteration variables here and one list, so we pick of elements by twos. Using three iteration variables would pick off by threes, etc. (The lmap command is just like foreach except it produces a list of the values obtained by evaluating its body script, whereas foreach throws those body script results away.)

Upvotes: 2

Related Questions