Reputation: 22957
I have an excel file with urls of type http://test.example.com/anything...
i want to make it http://test.example.com
does someone know the regex i should use ? (i got a macro in VB for the replace, i just need the regex)
thanks
Public Function SearchNReplace1(Pattern1 As String, _
Pattern2 As String, Replacestring As String, _
TestString As String)
Dim reg As New RegExp
reg.IgnoreCase = True
reg.MultiLine = False
reg.Pattern = Pattern1
If reg.Test(TestString) Then
reg.Pattern = Pattern2
SearchNReplace1 = reg.Replace(TestString, Replacestring)
Else
SearchNReplace1 = TestString
End If
End Function
Upvotes: 1
Views: 2200
Reputation: 20333
from: ([a-z]+://[a-z0-9.-]+)[^ ]*
to: \1
This will eat enything after the domain name until encountees a space or end of string. Please give more details if this one does not suit you.
If you need ipv6 addresses as hosts you have to allow []:
character too:
from: ([a-z]+://[a-z0-9.\[\]:-]+)[^ ]*
to: \1
Upvotes: 3
Reputation: 34395
RFC-3986 Appendix B. gives us this regex for decomposing a generic URI:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
Since you are interested in plucking out everything up to the path, here is an equivalent regex which should work quite nicely (in PHP syntax to allow comments):
$re = '%# Match URI and capture scheme and path in $1.
^ # Anchor to beginning of string.
( # $1: Everything up to path.
(?: [^:/?#]+:)? # Optional scheme.
(?://[^/?#]* )? # Optional authority.
) # End $1: Everything up to path.
[^?#]* # Required path.
(?:\? [^#]* )? # Optional query.
(?:\# .* )? # Optional fragment.
$ # Anchor to end of string.
%x';
And here is the exact same regex, in short form, that should work in VB:
myRegExp.Pattern = "^((?:[^:/?#]+:)?(?://[^/?#]*)?)[^?#]*(?:\?[^#]*)?(?:#.*)?$"
This regex does not validate the URI, it just decomposes it into its various components, and pluck out the part you need into capture group 1. Note that every component but the path is optional (and the path, itself, may be empty). In other words, an empty string is a valid URI!
Upvotes: 0