ratzip
ratzip

Reputation: 113

how to match 2 digit in regexp in TCL?

I have the following string need to match using regexp:

"The value is 0x0208 and the type is INTERNATION"

I want to get the digit 02 and 08, and store them into different two variable, I use the following regexp:

repexp "0x(\[0-9]+)\[^\\n]+INTERNALION" "The value is 0x0208 and the type is INTERNATION" whole first second

it can not get the second one, how to fix it?

Upvotes: 0

Views: 5319

Answers (1)

Bryan Oakley
Bryan Oakley

Reputation: 386210

First, use curly braces for regular expressions, it makes them much easier to read because you don't have to use extra backslashes.

Second, use \d for digits to make the expression a little shorter, which also improves readability.

Searching for pairs of digits

In your description you say you want to search for two pairs of digits following 0x. Here's a simple way to do that:

{0x(\d\d)(\d\d)}

This says "0x, followed by two digits that we capture, followed by two digits that we capture"

Searching for hexadecimal characters

Typically, hex numbers are preceeded by 0x, which makes me think you are actually trying to parse a hex number. If that's true, you need to search for more than just digits. To match a hex digit you need to use [0-9a-f]. Once a pattern gets slightly long (eg: [0-9a-f] vs. \d), you don't want to keep repeating it, so another way to say "two of these" is to use {2} rather than repeating the pattern.

Putting that all together, to match two groups of two hex digits you could use something like this:

{0x([0-9a-f]{2})([0-9a-f]{2})}

Dealing with upper and lower case

Note that this pattern assumes the hex digits are lowercase. If your particular data might have uppercase letters there are at least four ways to handle that:

  1. use the -nocase option to the regexp command
  2. use both upper and lowercase characters in the expression
  3. convert the string to lowercase before matching
  4. add embedded options to turn off case sensitivity

Of those, the last is likely the least obvious solution, so I'll present it here.

Tcl expressions can have a special sequence at the very start of the pattern that modifies how the regular expression works. In this case we want to tell it to ignore case. The way to do that is to add (?i) at the start of the pattern:

{(?i)0x([0-9a-f]{2})([0-9a-f]{2})}

For more information on embedded options, see the metasyntax section of the re_syntax man page.

Upvotes: 5

Related Questions