Reputation: 2868
I have a folder path like following:
/h/apps/new/app/k1999
I want to remove the /app/k1999
part with the following regular expression:
set folder "/h/apps/new/app/k1999"
regsub {\/app.+$} $folder "" new_folder
But the result is /h
: too many elements are being removed.
I noticed that I should use non-greedy matching, so I change the code to:
regsub {\/app.+?$} $folder "" new_folder
but the result is still /h
.
What's wrong with the above code?
Upvotes: 3
Views: 2698
Reputation: 13252
You can use a regular expression substitution operation to remove a directory suffix from a path name, but that doesn't mean you should.
file join {*}[lmap dir [file split $folder] {if {$dir ne {app}} {set dir} break}]
# -> /h/apps/new
A path name is a string, but more properly it's a list of directory names:
file split $folder
# -> / h apps new app k1999
What you want is the sublist of directory names up to, but not including, the directory named "app".
lmap dir [file split $folder] {if {$dir ne {app}} {set dir} break}
# -> / h apps new
(The directory name can be tested however you wish; a couple of possibilities are {$dir ni {foo app bar}}
to skip at alternative names, or {![string match app-* $dir]}
for any name beginning with "app-".)
And when you've gotten the list of directory names you wanted, you join the elements of it back to a path name again, as above.
So why should you do it this way instead of by using a regular expression substitution operation? This question illustrates the problem well. Unless one is an RE expert or takes great care to read the documentation, one is likely to formulate a regular expression based on a hunch. In the worst case, it works the first time. If not, one is tempted to tinker with it until it does. And any sufficiently ununderstood (yep, that is a word) RE will seem to work most of the time with occasional false positives and negatives to keep things interesting.
Split it, truncate it, join it. Can't go wrong. And if it does, it goes obviously wrong, forcing you to fix it.
Documentation: break, file, if, lmap, set
Upvotes: 2
Reputation: 36101
Non-greedy simply means that it will try to match the least amount of characters and increase that amount if the whole regex didn't match. The opposite - greedy - means that it will try to match as much characters as it can and reduce that amount if the whole regex didn't match.
$
in regex means the end of the string. Therefore something.+$
and something.+?$
will be equivalent, it is just that one will do more retries before it matches.
In your case /app.+
is matched by /apps
and this is the first occurrence of /app
in your string. You can fix it by being more explicit and adding the /
that follows /app
:
regsub {/app/.+$} $folder "" new_folder
Upvotes: 4
Reputation: 137567
The regular expression engine always starts matching as soon as it can; the greediness doesn't affect this. This means that in this case, it always starts matching too soon; you want the last match, not the first one.
If you use regexp -all -indices -inline
, you can find out where the last match starts. That lets you then remove the part you actually don't want (e.g., by replacing it with an empty string:
set folder "/h/apps/new/app/k1999"
set indices [regexp -all -indices -inline {/app} $folder]
# This gets this value: {2 5} {11 14}
# If we have indices — if we had a match — we can do the rest of our processing
if {[llength $indices] > 0} {
# Get the '11'; the first sub-element of the last element
set index [lindex $indices end 0]
# Replace '/app/k1999' with the empty string
set newfolder [string replace $folder $index end ""]
} else {
set newfolder $folder; # In case there's no match...
}
Upvotes: 2
Reputation: 626738
If you are looking to match app
as a whole word, you can make use of the word boundaries that in Tcl are \m
and \M
:
\m
matches only at the beginning of a word
\M
matches only at the end of a word
We only need the \M
as /
is a non-word character and we do not need \m
:
set folder "/h/apps/new/app/k1999"
regsub {/app\M.+$} $folder "" newfolder
puts $newfolder
See IDEONE demo
Result: /h/apps/new
(we remove everything from a whole word app
up to the end.)
If you want to remove just a part of the string inside the path, you can use negated class [^/]+
to make sure you only match a subpart of a path:
regsub {/app/[^/]+} $folder "" newfolder
Upvotes: 2