jay
jay

Reputation: 79

Want to know how to match up to a certain string - tcl regexp

How do I pattern match this in TCL?

set game1 base_ball_ABC_10_100_a_b_c
set game2 base_ball_CDE_20_200_d_e_f
set game3 base_bat_DEF_40_400_j_k_l

The regexp arg below does not seem to work since it will match up to 1st digit (ie: _10, _20)?

if { ![regexp {^([^0-9]+)(.*)$} $k3 - bb digit] } {

I wonder what would be the regexp to match base_ball, and AB_10_100_a_b_c? The above regexp seems to match up to a DIGIT?

I would like it to match [base_ball or base_bat] and then match [ABC_10_100, CDE_20_200].

Upvotes: 0

Views: 155

Answers (1)

Jerry
Jerry

Reputation: 71538

You don't need to use regexp if the strings have an equal number of underscores, and if all your need is always before and after the 2nd underscore:

set k3 {base_ball_ABC_10_100_a_b_c}
set parts [split $k3 "_"]
set part1 [join [lrange $parts 0 1] "_"]
# base_ball
set part2 [join [lrange $parts 2 end] "_"]
# ABC_10_100_a_b_c

Taking your previous question in consideration as well, you may not need to join either, if you're only counting the unique values, so doing

set k3 {base_ball_ABC_10_100_a_b_c}
set parts [split $k3 "_"]
set part1 [lrange $parts 0 1]
set part2 [lrange $parts 2 end]

should probably be enough.

Docs for split, lrange, join


If you still want to use regexp, then I'd advise reading re_syntax, and the expression really depends on a lot of things. There can be a lot of different expressions that work with your data, but the best expression can only be crafted by knowing about the data, and edge cases. From what I can assume, I'd guess something like this may work:

regexp {^([^_]+_[^_]+)_(.+)$} $k3 - bb digit

Where [^_]+ means that any character will be matched, except for _s, so that the above matches:

  • ^ - beginning of string
  • [^_]+ - any non _s
  • _ - one _
  • [^_]+ any non _s
  • _ one _
  • .+ - any characters
  • $ - end of string

Upvotes: 2

Related Questions