Pedro Delfino
Pedro Delfino

Reputation: 2701

How to use regular expression in Common Lisp to get everything in a string until the last occurence of "/"?

Suppose I have this string:

"http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html"

I would like to have a regular expression so that:

CL-USER> (some-regex "http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html")

Would return:

"http://www.gnu.org/software/emacs/manual/html_node/emacs/"

If I used the same function again on the previous output:

CL-USER> (some-regex "http://www.gnu.org/software/emacs/manual/html_node/emacs/")

It would again get everything until the last "/":

"http://www.gnu.org/software/emacs/manual/html_node/"

Preferably, using cl-ppcre.

Upvotes: 1

Views: 786

Answers (1)

Svante
Svante

Reputation: 51501

Your second example is returning not everything until the last, but everything to the second to last slash. I guess that you don't want to include the trailing slash to make this more regular. Then the regular expression might in simple cases be (.*)/.*. However, this gets surprising when there is no path:

CL-USER> (defun shorten-uri-string (s)
           (aref (nth-value 1 (cl-ppcre:scan-to-strings "(.*)/.*" s)) 0))
SHORTEN-URI-STRING
CL-USER> (shorten-uri-string
          "http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html")
"http://www.gnu.org/software/emacs/manual/html_node/emacs"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software/emacs/manual/html_node"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software/emacs/manual"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software/emacs"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org/software"
CL-USER> (shorten-uri-string *)
"http://www.gnu.org"
CL-USER> (shorten-uri-string *)
"http:/"

I recommend treating URIs as a data structure, not as a string, by parsing it. The parser also knows everything about allowed/disallowed characters in each part of the URI.

For example, parse it with puri:

CL-USER> (defun shorten-uri-path (uri)
           (let* ((puri (puri:parse-uri uri))
                  (new-puri (puri:copy-uri puri)))
             (when (puri:uri-parsed-path puri)
               (setf (puri:uri-parsed-path new-puri)
                     (butlast (puri:uri-parsed-path puri))))
             new-puri))
SHORTEN-URI-PATH
CL-USER> (shorten-uri-path
          "http://www.gnu.org/software/emacs/manual/html_node/emacs/index.html")
#<PURI:URI http://www.gnu.org/software/emacs/manual/html_node/emacs>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software/emacs/manual/html_node>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software/emacs/manual>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software/emacs>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/software>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/>
CL-USER> (shorten-uri-path *)
#<PURI:URI http://www.gnu.org/>

You can render a URI to a stream with puri:render-uri. You can also explicitly deal with query and fragment.

Upvotes: 4

Related Questions