Tamilan
Tamilan

Reputation: 981

regsub/regex parsing on list of elements in tcl

I need to convert a string with list with more than one elements (<>,abcd1,1,1) as like below.

From:

test={abc([(<>,yifow3,1,1),(abc,yifow3,2,2,20140920,20151021),(<>,yifow3,3,3,20140920,20151021),(<>,yifow3,4,4)])}

To:

abc([(yifow3,1,1),(yifow3,2,2),(yifow3,3,3),(yifow3,4,4)])

I tried to extract the list inside abc([]) using the below regsub. always it will have "abc([" at the begining and "])" at the end.

regsub -all {(abc\(\[)([a-z0-9\<\>\(\),]+)(\)\])} $test {\2} test2

then from test2, using the for loop to extract the second, third, fourth items from each elements (<>,abcd1,1,1).

Is there any simple way to extract using regsub/regex instead of for loop?

regex should extract second, third and fourth items ignoring first and fifth and sixth if they presents.

Upvotes: 0

Views: 1615

Answers (2)

glenn jackman
glenn jackman

Reputation: 246799

regsub -all -expanded {
    \(                        # a literal parenthesis
    [^(,]+ ,                  # 1 or more non-(parenthesis or comma)s and comma
    ( [^,]+ , \d+ , \d+ )     # the 3 fields to keep with commas
    [^)]*                     # 0 or more non-parenthesis chars
    \)                        # a literal parenthesis
} $test {(\1)}

returns

abc([(yifow3,1,1),(yifow3,2,2),(yifow3,3,3),(yifow3,4,4)])

Upvotes: 1

Jerry
Jerry

Reputation: 71538

Ok, based strictly on what you have in your question, you could first get all the things inside the innermost paren with a regex if you are already sure the string begins with abc([ and ends with ]):

set test {abc([(<>,yifow3,1,1),(abc,yifow3,2,2,20140920,20151021),(<>,yifow3,3,3,20140920,20151021),(<>,yifow3,4,4)])}
set items [regexp -all -inline -- {\([^()]+\)} $test]
# (<>,yifow3,1,1) (abc,yifow3,2,2,20140920,20151021) (<>,yifow3,3,3,20140920,20151021) (<>,yifow3,4,4)

Then you can loop through each (split on comma, get the 2nd to 4th elements and join them back, etc).

I don't think you can avoid using a loop if you want to keep it simple. You can skip a few steps I guess with a more elaborate (no more simple!) regex:

set test {abc([(<>,yifow3,1,1),(abc,yifow3,2,2,20140920,20151021),(<>,yifow3,3,3,20140920,20151021),(<>,yifow3,4,4)])}
set items [regexp -all -inline -- {\([^,()]+((?:,[^,()]+){3})} $test]
set results [lmap {a b} $items {list [string trim $b ,]}]
# yifow3,1,1 yifow3,2,2 yifow3,3,3 yifow3,4,4

The regex here \([^,()]+((?:,[^,()]+){3}) matches as follows:

\(                 # Literal opening paren
[^,()]+            # Any character except ',', '(' and ')'
(
  (?:,[^,()]+){3}  # A comma followed by any character except ',', '(' and ')',
                   # the whole thing 3 times
)

I used lmap (Tcl8.6) here which is basically a kind of loop. You can change it a bit to get the string you are looking for:

set results [lmap {a b} $items {list "([string trim $b ,])"}]
set output "abc(\[[join $results ,]])"
# abc([(yifow3,1,1),(yifow3,2,2),(yifow3,3,3),(yifow3,4,4)])

Upvotes: 1

Related Questions