Semantics
Semantics

Reputation: 164

extracting string between labels using regex

I have the following string

------------------------------------------------------------------------
r100 | dawson | 2012-10-3 04:21:27 -0600 (Wed, 3 Oct 2012) | 8 lines
Changed paths:
   M /branches/project/foo.cpp
   A /branches/project/foo1.cpp
   D /branches/project/foo2.cpp

:SUMMARY: Add new file
:Module:

------------------------------------------------------------------------

Now what I am trying to do is, make list of all the files that have changed for a particular commit. For that I first need to extract the information between label "Changed paths:" and ":SUMMARY:" , well the regex solution I have is not very neat. When I do,

set blocks [regexp -nocase -lineanchor -inline -all -- {^\s*?Changed paths\s*?:\s*?.*?:} $summary]

where $summary is the string content above, my output is,

{Changed paths:
   M /branches/project/foo.cpp
   A /branches/project/foo1.cpp
   D /branches/project/foo2.cpp

:}

Expected output:

   M /branches/project/foo.cpp
   A /branches/project/foo1.cpp
   D /branches/project/foo2.cpp

I cant seem to get rid of "Changed paths:" . I dont have lot of experience with this, can anyone point out what am I doing wrong, and if there is a way to store those changed files in a list may be?

Upvotes: 1

Views: 192

Answers (2)

Peter Lewerin
Peter Lewerin

Reputation: 13272

You could also use (based on your example)

regexp -nocase -line -inline -all {^\s+.*$} $summary

or

regexp -nocase -line -inline -all {^.*\.cpp$} $summary

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

You need to wrap the part of the regex pattern that fetches the substring you need with capturing parentheses, and then specify a variable that will hold the value in regexp:

Changed paths\s*?:\s*?(.*?):SUMMARY
                      ^^^^^

See the demo below:

set summary {------------------------------------------------------------------------
r100 | dawson | 2012-10-3 04:21:27 -0600 (Wed, 3 Oct 2012) | 8 lines
Changed paths:
   M /branches/project/foo.cpp
   A /branches/project/foo1.cpp
   D /branches/project/foo2.cpp

:SUMMARY: Add new file
:Module:

------------------------------------------------------------------------}
regexp {\n\s*?Changed paths\s*?:\s*?(.*?):SUMMARY} $summary - blocks
puts $blocks

See the Tcl online demo

If the Changed paths appears at the start of a string, use ^ instead of \n.

The $summary - blocks means: we pass $summary string to the regexp, and discard the whole match value (-) and assign the Capture group 1 contents to the $blocks variable.

Upvotes: 1

Related Questions