Reputation: 2869
I asked a similar question recently about using regex to retrieve a URL or folder path from a string. I was looking at this comment by Dour High Arch, where he says:
"I recommend you do not use regexes at all; use separate code paths for URLs, using the Uri class, and file paths, using the FileInfo class. These classes already handle parsing, matching, extracting components, and so on."
I never really tried this, but now I am looking into it and can't figure out if what he said actually is useful to what I'm trying to accomplish.
I want to be able to parse a string message that could be something like:
"I placed the files on the server at http://www.thewebsite.com/NewStuff, they can also be reached on your local network drives at J:\Downloads\NewStuff"
And extract out the two strings http://www.thewebsite.com/
and J:\Downloads\NewStuff
. I don't see any methods on the Uri
or FileInfo
class that parse a Uri
or FileInfo
object from a string like I think Dour High Arch was implying.
Is there something I'm missing about using the Uri
or FileInfo
class that will allow this behavior? If not is there some other class in the framework that does this?
Upvotes: 1
Views: 2165
Reputation: 152
U can use :
(?<type>[^ ]+?:)(?<path>//[^ ]*|\\.+\\[^ ]*)
that will give you 2 groups on each result
type : "http:"
path : //www.thewebsite.com/NewStuff
and
type : "J:"
path : \Downloads\NewStuff
out of the string
"I placed the files on the server at http://www.thewebsite.com/NewStuff, they can also be reached on your local network drives at J:\Downloads\NewStuff"
you can use the "type" group to see if the type is http:
or not and set action on that.
EDIT
or use regex below if you are sure there is no whitespace in your filepath :
(?<type>[^ ]+?:)(?<path>//[^ ]*|\\[^ ]*)
Upvotes: 1
Reputation: 21712
It was not clear from your earlier question that you wanted to extract URL and file path substrings from larger strings. In that case, neither Uri.IsWellFormedUriString
nor rRegex.Match
will do what you want. Indeed, I do not think any simple method can do what you want because you will have to define rules for ambiguous strings like httX://wasThatAUriScheme/andAre/these part/of/aURL or/are they/separate.strings?andIsThis%20a%20Param?
My suggestion is to define a recursive descent parser and create states for each substring you need to distinguish.
Upvotes: 1
Reputation: 9467
I'd say the easiest way is splitting the strings into parts first.
First delimiter would be spaces, for each word - second would be qoutes (double and single)
Then use Uri.IsWellFormedUriString on each token.
So something like:
foreach(var part in String.Split(new char[]{''', '"', ' '}, someRandomText))
{
if(Uri.IsWellFormedUriString(part, UriKind.RelativeOrAbsolute))
doSomethingWith(part);
}
Just saw at URI.IseWellFormedURIString that this is a bit to strickt to suit your needs maybe. It returns false if www.Whatever.com is missing the http://
Upvotes: 1