chaney
chaney

Reputation: 1

How to use regexp in TCL to match a line from a file?

I am new to TCL. So I am asked to extract the start date from a file but I tried and there is no output. Please help.

From my file,there is this line i want to extract the start date:

Running final_step.step_done at: Wed Oct 11 02:04:03 MYT 2017

My code:

proc extract_data {} {
    ## To extract startdate 
    set file [open files/stages.files]
    while {[gets $file line] >= 0} {
        if {[regexp {^Running (\S+\s)at: (\S+.*)$} $line match Stage StartDate]} {
            if {[regexp "[$CURRENT_STAGE]\.step_done" $Stage]} {
                #set stage $Stage
                                set end_date $StartDate
                set print_end_date [regsub -all " " $StartDate "_"]
                                #echo "2) $stage - $end_date"
            } elseif {[regexp "^[$CURRENT_STAGE] " $Stage]} {
                #set stage $Stage
                set start_date $StartDate
                set print_start_date [regsub -all " " $StartDate "_"]
                #echo "1) $stage - $start_date"
            }
        }
    }

Is there something wrong with my regexp?

Upvotes: 0

Views: 4153

Answers (2)

Peter Lewerin
Peter Lewerin

Reputation: 13282

It seems to me you should be able to get a lot done with code like this:

while {[gets $file line] >= 0} {
    if {[string match Running $line]} {
        set Stage [lindex [split $line] 1]
        set StartDate [lindex [string trim [split $line :]] end]
        if {[string match *.step_done $Stage]} {
            set end_date $StartDate
            set print_end_date [string map {" " _} $StartDate]
        } else {
            set start_date $StartDate
            set print_start_date [string map {" " _} $StartDate]
        }
    }
}

That is,

  • check if the line starts with "Running"
  • get the string between "Running" and "at:" into Stage
  • get the date string after ":" into StartDate
  • check if there is a string tail of the form "step_done" after a period in $Stage
  • if there is, set end_date to $StartDate and print_end_date to the same string with all blanks replaced with underscores
  • if the tail is empty, do the same with start_date and print_start_date

Documentation: >= (operator), gets, if, lindex, set, split, string, while

Upvotes: 0

Donal Fellows
Donal Fellows

Reputation: 137787

The main RE looks fine — ^Running (\S+\s)at: (\S+.*)$ does indeed match the line that you're talking about — but these RE matches look suspicious:

regexp "[$CURRENT_STAGE]\.step_done" $Stage
regexp "^[$CURRENT_STAGE] " $Stage

In particular, you've got a command substitution in there with the name of the command coming from a variable. That's… valid in some circumstances, but quite an advanced technique; are you sure that's what you want? Also, the CURRENT_STAGE variable appears to be undeclared. I'd expect one of these approaches to be more likely to work:

Variable Substitution

Here, we're using the qualified version of the variable name. Note that the variable had better contain a valid regular expression fragment, and we need to double up the backslash (because we're in a double-quoted context and not a braced context; one backslash is for the basic Tcl language, and the other is for the RE engine).

regexp "$::CURRENT_STAGE\\.step_done" $Stage
regexp "^$::CURRENT_STAGE " $Stage

Command Substitution

Here, we're calling a command to get the actual stage. The command had better return a valid RE fragment, and as before, we're doubling up the backslash.

regexp "[CURRENT_STAGE]\\.step_done" $Stage
regexp "^[CURRENT_STAGE] " $Stage

In general, in both cases you might consider wrapping the part of the RE that represents the current stage in (?:), as that doesn't really change the semantics much, but does mean that the RE fragment can use features like alternation safely. Not that it matters when the RE fragment is a simple thing like final_step.

Upvotes: 1

Related Questions