Reputation: 2701
Suppose I have this string:
"http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html"
I would like to have a regular expression so that:
CL-USER> (some-regex "http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html")
Would return:
"http://www.gnu.org/software/emacs/manual/html_node/emacs/"
If I used the same function again on the previous output:
CL-USER> (some-regex "http://www.gnu.org/software/emacs/manual/html_node/emacs/")
It would again get everything until the last "/":
"http://www.gnu.org/software/emacs/manual/html_node/"
Preferably, using cl-ppcre.
Upvotes: 1
Views: 786
Reputation: 51501
Your second example is returning not everything until the last, but everything to the second to last slash. I guess that you don't want to include the trailing slash to make this more regular. Then the regular expression might in simple cases be (.*)/.*
. However, this gets surprising when there is no path:
CL-USER> (defun shorten-uri-string (s)
(aref (nth-value 1 (cl-ppcre:scan-to-strings "(.*)/.*" s)) 0))
SHORTEN-URI-STRING
CL-USER> (shorten-uri-string
"http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html")
"http://www.gnu.org/software/emacs/manual/html_node/emacs"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software/emacs/manual/html_node"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software/emacs/manual"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software/emacs"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org"
CL-USER> (shorten-uri-string *)
"http:/"
I recommend treating URIs as a data structure, not as a string, by parsing it. The parser also knows everything about allowed/disallowed characters in each part of the URI.
For example, parse it with puri
:
CL-USER> (defun shorten-uri-path (uri)
(let* ((puri (puri:parse-uri uri))
(new-puri (puri:copy-uri puri)))
(when (puri:uri-parsed-path puri)
(setf (puri:uri-parsed-path new-puri)
(butlast (puri:uri-parsed-path puri))))
new-puri))
SHORTEN-URI-PATH
CL-USER> (shorten-uri-path
"http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html")
#<PURI:URI http://www.gnu.org/software/emacs/manual/html_node/emacs>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software/emacs/manual/html_node>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software/emacs/manual>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software/emacs>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/>
You can render a URI to a stream with puri:render-uri
. You can also explicitly deal with query and fragment.
Upvotes: 4