Reputation: 6032
I'm having a hell of a problem reliably extracting the URL from an HTTP header using regex. It's not helped by the header alternately arriving with and without ^M characters which don't seem to match the white space class. Currently the best I've managed so far is:
(re-search-forward "^x-url: .*/\\{2,3\\}\\(.*\\)" nil t)
But of course that also picks up the ^M if it exists as well as the URL paramters which I don't really need. To give you an example from my debugging:
x-url: http://wiki/mediawiki/index.php?title=Vsmux&action=edit&redlink=1
x-url: http://wiki/mediawiki/index.php?title=Vsmux&action=edit&redlink=1^M
What I really want in both cases is just the result:
wiki/mediawiki/index.php
Upvotes: 1
Views: 710
Reputation: 6032
For completeness I should probably add another solution I've tried based on discussion with @wvxvw about using a proper parser. This renders to elisp code looking a bit like this:
(save-excursion
(let* ((url-string (url-get-url-at-point (re-search-forward "^x-url: ")))
(url (url-generic-parse-url url-string))
(arg-split (string-match-p "?" (url-filename url))))
(format "%s%s" (url-host url)
(if arg-split
(substring (url-filename url) 0 arg-split)
(url-filename url)))))
Upvotes: 2
Reputation:
This looks horrible, but I'm not responsible for how it looks - people who invented this idiotic standard are... But this should follow the standard (the old version, which didn't include Unicode characters and their translation) very closely:
"^x-url:\\s-*\\(\\w\\|\\+\\|-\\)+://\\(\\w\\|\\-\\)+\\(\\.\\w+\\)?\\(\\/\\(%[0-9a-fA-F]\\{2\\}\\|[~\\.A-Za-z_+-]*\\)*\\)*"
This is unless some "helpful" program already did translation from percent-encoded URI components into their original non-encoded form.
Also, there are some technical limits on how long the parts of the URL may be, I'm not going to try to implement that...
Also, it assumes that authentication scheme, like that in the basic authentication is never used. Otherwise it would be a whole lot easier to do it w/o regular expression.
Upvotes: 3
Reputation: 53694
How about something like (this assumes all urls will have "://" in them):
(re-search-forward "^x-url: [^:]*://\\([^?\r\n]+\\).*?$")
Upvotes: 2