z3nth10n
z3nth10n

Reputation: 2441

Is there any way to get the file extension from a URL

I want to know that for make sure that the file that will be download from my script will have the extension I want.

The file will not be at URLs like:

http://example.com/this_url_will_download_a_file

Or maybe yes, but, I think that I will only use that kind of URL:

http://example.com/file.jpg

I will not check it with: Url.Substring(Url.LastIndexOf(".") - 3, 3) because this is a very poor way.

So, what do you recommend me to do?

Upvotes: 34

Views: 44123

Answers (8)

Ilan
Ilan

Reputation: 664

The following code will work on a wider range of URLs (absolute and relative), including those URLs that have fragment part (#x..) or query part (?x..), try it:

private static readonly char[] _fragmentAndQueryMarkers = ['?', '#'];
public static string? GetUrlPathExtension(string url)
{
    try
    {
        if (Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out var uri))
        {
            // Get the path part directly, whether the URL is relative or absolute
            int pos;
            var path = uri.IsAbsoluteUri ? uri.LocalPath : (pos = url.IndexOfAny(_fragmentAndQueryMarkers)) < 0 ? url : url[..pos];

            // Find the last '/' and '.' in the path
            var i = path.Length - 1;
            while (i >= 0 && path[i] != '/' && path[i] != '.')
                i--;

            // If there's a '.', return the extension
            if (i >= 0 && path[i] == '.')
                return path[i..];
        }
    }
    catch { }

    return null;
}

Upvotes: 0

heringer
heringer

Reputation: 3188

It is weird, but it works:

string url = @"http://example.com/file.jpg";
string ext = System.IO.Path.GetExtension(url);
MessageBox.Show(this, ext);

but as crono remarked below, it will not work with parameters:

string url = @"http://example.com/file.jpg?par=x";
string ext = System.IO.Path.GetExtension(url);
MessageBox.Show(this, ext);

result: ".jpg?par=x"

Upvotes: 20

roxl
roxl

Reputation: 1

VirtualPathUtility.GetExtension(yourPath) returns the file extension from the specified path, including the leading period.

Upvotes: 0

Alex from Jitbit
Alex from Jitbit

Reputation: 60642

here's a simple one I use. Works with parameters, with absolute and relative URLs, etc. etc.

public static string GetFileExtensionFromUrl(string url)
{
    url = url.Split('?')[0];
    url = url.Split('/').Last();
    return url.Contains('.') ? url.Substring(url.LastIndexOf('.')) : "";
}

Unit test if you will

[TestMethod]
public void TestGetExt()
{
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("../wtf.js?x=wtf")==".js");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("wtf.js")==".js");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("http://www.com/wtf.js?wtf")==".js");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("wtf") == "");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("") == "");
}

Tune for your own needs.

P.S. Do not use Path.GetExtension cause it does not work with query-string params

Upvotes: 20

stfno.me
stfno.me

Reputation: 916

I know that this is an old question, but can be helpful to people that see this question.

The best approach for getting an extension from filename inside an URL, also with parameters are with regex.

You can use this pattern (not urls only):

.+(\.\w{3})\?*.*

Explanation:

.+     Match any character between one and infinite
(...)  With this, you create a group, after you can use for getting string inside the brackets
\.     Match the character '.'
\w     Matches any word character equal to [a-zA-Z0-9_]
\?*    Match the character '?' between zero and infinite
.*     Match any character between zero and infinite

Example:

http://example.com/file.png
http://example.com/file.png?foo=10

But if you have an URL like this:

http://example.com/asd
This take '.com' as extension.

So you can use a strong pattern for urls like this:

.+\/{2}.+\/{1}.+(\.\w+)\?*.*

Explanation:

.+        Match any character between one and infinite
\/{2}     Match two '/' characters
.+        Match any character between one and infinite
\/{1}     Match one '/' character
.+        Match any character between one and infinite
(\.\w+)  Group and match '.' character and any word character equal to [a-zA-Z0-9_] from one to infinite
\?*       Match the character '?' between zero and infinite
.*        Match any character between zero and infinite

Example:

http://example.com/file.png          (Match .png)
https://example.com/file.png?foo=10  (Match .png)
http://example.com/asd               (No match)
C:\Foo\file.png                      (No match, only urls!)

http://example.com/file.png

    http:        .+
    //           \/{2}
    example.com  .+
    /            \/{1}
    file         .+
    .png         (\.\w+)

Upvotes: 5

Sean T
Sean T

Reputation: 2494

Some have suggested requesting the file from the url and checking the headers. That's overkill for something so simple in my opinion so...

Heringers answer fails if parameters are present on the url, the solution is simple just Split on the query string char ?.

string url = @"http://example.com/file.jpg";
string ext = System.IO.Path.GetExtension(url.Split('?')[0]);

Upvotes: 3

Cedric Arnould
Cedric Arnould

Reputation: 2393

Here is my solution:

if (Uri.TryCreate(url, UriKind.Absolute, out var uri)){
    Console.WriteLine(Path.GetExtension(uri.LocalPath));
}

First, I verify that my url is a valid url, then I get the file extension from the local path.

Upvotes: 4

Justin
Justin

Reputation: 86729

If you just want to get the .jpg part of http://example.com/file.jpg then just use Path.GetExtension as heringer suggests.

// The following evaluates to ".jpg"
Path.GetExtension("http://example.com/file.jpg")

If the download link is something like http://example.com/this_url_will_download_a_file then the filename will be contained as part of the Content-Disposition, a HTTP header that is used to suggest a filename for browsers that display a "save file" dialog. If you want to get this filename then you can use the technique suggested by Get filename without Content-Disposition to initiate the download and get the HTTP headers, but cancel the download without actually downloading any of the file

HttpWebResponse res = (HttpWebResponse)request.GetResponse();
using (Stream rstream = res.GetResponseStream())
{
    string fileName = res.Headers["Content-Disposition"] != null ?
        res.Headers["Content-Disposition"].Replace("attachment; filename=", "").Replace("\"", "") :
        res.Headers["Location"] != null ? Path.GetFileName(res.Headers["Location"]) : 
        Path.GetFileName(url).Contains('?') || Path.GetFileName(url).Contains('=') ?
        Path.GetFileName(res.ResponseUri.ToString()) : defaultFileName;
}
res.Close();

Upvotes: 4

Related Questions