SandBag_1996
SandBag_1996

Reputation: 1601

Regex to extract data between two labels

I have a file svn_log.txt with following data:

:SUMMARY: This module is test created
:TIME: the current time is not listed

I am using tcl and regex to extract summary from this file.

set svn_logs svn_logs.txt
set fp [open $svn_logs r]
set lines [split [read -nonewline $fp] "\n"]
close $fp
foreach line $lines {
    if {[regexp -nocase {^\s*(:SUMMARY)\s*:\s*(.*)$} $line match tag value]} {
        set [string tolower $tag] $value
    }
}
puts $value

It works fine until summary has only one line. But there are cases where summary has points:

:SUMMARY: Following changes needs to be added
1. this one
2. this one too
:TIME:

In this case, it doesn't extract anything other than the first line. I am having hard time trying to modify the above regex command to take anything between :SUMMARY and :TIME. New to regex. Can anyone provide any input?

original content of file ->

------------------------------------------------------------------------
r743 | aaddh | 2014-04-01 12:33:42 -0500 (Tue, 01 Apr 2014) | 8 lines

:SUMMARY: Modified file to add following changes:
1.Loop to avoid . 
2.Change directory 
3.The batch file
:TIME: Invalid
:Test:
:Comments:

Upvotes: 1

Views: 1157

Answers (3)

glenn jackman
glenn jackman

Reputation: 247012

The regexp solution is very compact. If you're reading the lines of the file, you could do:

set fh [open file r]
set insumm false
while {[gets $fh line] != -1} {
    switch -regex -- $line {
        {^:SUMMARY:} {set insumm true; set summary [string range $line 10 end]} 
        {^:\w+:} break
        default {if {$insumm} {append summary \n $line}}
    }
}
close $fh

Upvotes: 1

Jerry
Jerry

Reputation: 71578

You will have to use a different approach if you really want to use regex. You will have to read the whole file at one go and use regex on it:

set svn_logs svn_logs.txt
set fp [open $svn_logs r]
set lines [read -nonewline $fp]
close $fp
regexp -nocase -lineanchor -- {^\s*(:SUMMARY)\s*:\s*(.*?):TIME:$} $lines match tag value
puts $value

With as input:

:SUMMARY: Following changes needs to be added
1. this one
2. this one too
:TIME:

You get:

Following changes needs to be added
1. this one
2. this one too

codepad demo

The -lineanchor flag makes ^ match all beginning of lines and $ match all end of lines. -- just makes sure that there are no extra flags.

Note: There's a lingering newline at the end of the captured group, you can trim it if required.

Upvotes: 2

mckurt
mckurt

Reputation: 154

You can try something like: [^:SUMMARY:](.*)[^:TIME:]

Upvotes: -1

Related Questions