Sumit
Sumit

Reputation: 2023

Matching a regexp in TCL PERL

I am having follwing pattern

    Pattern[1]: 
    Key : "key1" 
    Value : 100
    Pattern[2]: 
    Key : "key2" 
    Value : 20
    Pattern[3]: 
    Key : "key3" 
    Value : 30
    Pattern[4]: 
    Key : "key4" 
    Value : 220

I want to segregate each Pattern block . I am using TCL . Regexp that I am using is not resolving the purpose

set updateList [regexp -all -inline {Pattern\[\d+\].*?Value.*?\n} $list]

Which Regexp to use to segregate each pattern

I need output as

    Pattern[1]: 
    Key : "key1" 
    Value : 100


    Pattern[2]: 
    Key : "key2" 
    Value : 20


    Pattern[3]: 
    Key : "key3" 
    Value : 30


    Pattern[4]: 
    Key : "key4" 
    Value : 220

Upvotes: 1

Views: 292

Answers (4)

Peter Lewerin
Peter Lewerin

Reputation: 13272

You want to capture blocks of lines and output them with blank lines in between. Your example data displays patterns on different levels that can be used to recognize which lines belong to which block.

The simplest pattern is this: every three lines in the input make up a block. This pattern suggests processing like this:

set lines [split [string trim $list \n] \n]
foreach {a b c} $lines {puts $a\n$b\n$c\n\n}

There is nothing in your example data that suggests that this wouldn't work. Still, there may be some complications that aren't reflected in your example data.

If there are stray blank lines in the input, you might need to get rid of them first:

set lines [lmap line $lines {if {[string is space $line]} continue else {set line}}]

If some blocks contain less or more lines than in your example, another simple pattern is that every block starts with a line that has optional(?) whitespace and the word Pattern. Those lines (except the first) should be preceded by a block-delimiter in the output:

set lines [split [string trim $list \n] \n]
puts [lindex $lines 0]
foreach line [lrange $lines 1 end] {
    if {[regexp {\s*Pattern} $line]} {
        puts \n$line
    } else {
        puts $line
    }
}
puts \n

If the lines don't actually begin with whitespace, you could use string match Pattern* $line instead of the regular expression.

Documentation: continue, foreach, if, lindex, lmap, lmap replacement, lrange, puts, regexp, set, split, string

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247042

% set list {    Pattern[1]: 
    Key : "key1" 
    Value : 100
    Pattern[2]: 
    Key : "key2" 
    Value : 20
    Pattern[3]: 
    Key : "key3" 
    Value : 30
    Pattern[4]: 
    Key : "key4" 
    Value : 220
}
% regexp -all -inline {Pattern\[\d+\].*?Value.*?\n} $list
{Pattern[1]: 
    Key : "key1" 
    Value : 100
    Pattern[2]: 
    Key : "key2" 
    Value : 20
    Pattern[3]: 
    Key : "key3" 
    Value : 30
    Pattern[4]: 
    Key : "key4" 
    Value : 220
}
% regexp -all -inline {Pattern\[\d+?\].*?Value.*?\n} $list   ;# only changing `\d+` to `\d+?`
{Pattern[1]: 
    Key : "key1" 
    Value : 100
} {Pattern[2]: 
    Key : "key2" 
    Value : 20
} {Pattern[3]: 
    Key : "key3" 
    Value : 30
} {Pattern[4]: 
    Key : "key4" 
    Value : 220
}

If $list does not end with a newline, you won't get the "pattern[4]" element returned. In that case, change

% regexp -all -inline {Pattern\[\d+?\].*?Value.*?\n} $list

to

% regexp -all -inline {Pattern\[\d+?\].*?Value.*?(?:\n|$)} $list

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627128

Your pattern Pattern\[\d+\].*?Value.*?\n contains mixed quantifiers: both greedy and lazy. Tcl does not handle mixed quantifier type as you would expect it in, say, PCRE (PHP, Perl), .NET, etc., it defaults to the first found one, as the subsequent quantifiers inherit the preceding quantifier type. So, the + after \d is greedy, thus, all others (in .*?) are also greedy - even if you declared them to be lazy. Also, the . matches a newline in Tcl regex, too, so, your pattern works like this.

So, based on your regex, you can make the \d+ lazy with \d+? and replace \n at the end with (?:\n|$) to match both the newline and the end of string:

set RE {Pattern\[\d+?\].*?Value.*?(?:\n|$)}
set updateList [regexp -all -inline $RE $str]

See the IDEONE demo

Alternative 1

Also, you can use a more verbose regex if your input string always has the same structure with all elements - Pattern, Key, Value - present:

set updateList [regexp -all -inline {Pattern\[\d+\]:\s*Key[^\n]*\s*Value[^\n]*} $str]

See the IDEONE demo, and here is the regex demo.

Since a . can match a newline, we need to use a [^\n] negated character class matching any character but a line feed.

Alternative 2

You can use an unrolled lazy subpattern matching Pattern[n]: and then any character that is not a starting point for a Pattern[n]: sequence:

set RE {Pattern\[\d+\]:[^P]*(?:P(?!attern\[\d+\]).)*}
set updateList [regexp -all -inline $RE $str]

See another IDEONE demo and a regex101 demo

Upvotes: 2

Hp93
Hp93

Reputation: 1535

Try this

Pattern\[\d+\](.|\n)*?Value.*?\n

The dot . character matches any characters but line break, so you need to add it in. Be aware that your line may end with a carriage character so you might need to add \r in.

Upvotes: 1

Related Questions