Ian Boyd
Ian Boyd

Reputation: 256761

Get URL parts in winapi

Is there an API in Windows that can crack a url into parts?

Background

The format of a URL is:

stackoverflow://iboyd:[email protected]:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose
\___________/   \___/ \________/ \____________________/ \___/ \___________________________/\_______________________/ \__/
     |            |       |               |               |                |                          |                |
   scheme     username password        hostname          port             path                      query           fragment

Is there a function in (native) Win32 api that can crack a URL into parts:

Some functions don't work

There are some functions in WinApi, but they fail to do the job because they don't understand schemes except the ones that WinHttp can use:

both fail to understand urls such as:

WinHttpCrackUrl actively prevents being used to crack URLs:

If the Internet protocol of the URL passed in for pwszUrl is not HTTP or HTTPS, then WinHttpCrackUrl returns FALSE and GetLastError indicates ERROR_WINHTTP_UNRECOGNIZED_SCHEME.

Is there another native API in Windows that can get parts of a url?

Bonus Chatter

Here's how you do it in CLR (e.g. C#): (fiddle)

using System;

public class Program
{
    public static void Main()
    {
        var uri = new Uri("stackoverflow://iboyd:[email protected]:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose");

        Console.WriteLine("Uri.Scheme: "+uri.Scheme);
        Console.WriteLine("Uri.UserInfo: "+uri.UserInfo);
        Console.WriteLine("Uri.Host: "+uri.Host);
        Console.WriteLine("Uri.Port: "+uri.Port);
        Console.WriteLine("Uri.AbsolutePath: "+uri.AbsolutePath);
        Console.WriteLine("Uri.Query: "+uri.Query);
        Console.WriteLine("Uri.Fragment: "+uri.Fragment);
    }
}

Outputs

Uri.Scheme: stackoverflow
Uri.UserInfo: iboyd:password01
Uri.Host: mail.stackoverflow.com
Uri.Port: 12386
Uri.AbsolutePath: /questions/SubmitQuestion.aspx
Uri.Query: ?useLiveData=1&internal=0
Uri.Fragment: #nose

Upvotes: 1

Views: 2819

Answers (2)

IInspectable
IInspectable

Reputation: 51413

Is there an API in Windows that can crack a url into parts?

There is in Windows 10. The Uri class in the Windows Runtime is capable of decomposing a URI into its individual parts. This is not strictly part of the Windows API, but consumable by any Windows API application.

The following code illustrates its usage. It is written using the C++/WinRT language projection, requiring a C++17 compiler. If you cannot switch to a C++17 compiler, you can use the Windows Runtime C++ Template Library (WRL) instead to consume the Windows Runtime APIs.

#include <iostream>
#include <string>
#include <winrt/Windows.Foundation.h>

#pragma comment(lib, "WindowsApp.lib")

using namespace winrt;
using namespace Windows::Foundation;

int wmain(int argc, wchar_t* wargv[])
{
    if (argc != 2)
    {
        std::wcout << L"Usage:\n  UrlCracker <url>" << std::endl;
        return 1;
    }

    init_apartment();

    Uri const uri{ wargv[1] };
    std::wcout << L"Scheme: " << uri.SchemeName().c_str() << std::endl;
    std::wcout << L"Username: " << uri.UserName().c_str() << std::endl;
    std::wcout << L"Password: " << uri.Password().c_str() << std::endl;
    std::wcout << L"Host: " << uri.Host().c_str() << std::endl;
    std::wcout << L"Port: " << std::to_wstring(uri.Port()) << std::endl;
    std::wcout << L"Path: " << uri.Path().c_str() << std::endl;
    std::wcout << L"Query: " << uri.Query().c_str() << std::endl;
    std::wcout << L"Fragment: " << uri.Fragment().c_str() << std::endl;
}

This program digests any URI spelled out in the question. Using the input

stackoverflow://iboyd:[email protected]:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose

produces the following output:

Scheme: stackoverflow
Username: iboyd
Password: password01
Host: mail.stackoverflow.com
Port: 12386
Path: /questions/SubmitQuestion.aspx
Query: ?useLiveData=1&internal=0
Fragment: #nose

Error handling has been omitted. In case the Uri c'tor is passed an invalid string, it throws an exception of type winrt::hresult_error. If you cannot use exceptions in your code, you can activate the type manually (e.g. using the WRL), and inspect the HRESULT return values instead.

Upvotes: 1

Ian Boyd
Ian Boyd

Reputation: 256761

There are a number of functions available to native Windows developers:

Of these, InternetCrackUrl works.

URL_COMPONENTS components;
components.dwStructSize      = sizeof(URL_COMPONENTS);
components.dwSchemeLength    = DWORD(-1);
components.dwHostNameLength  = DWORD(-1);
components.dwUserNameLength  = DWORD(-1);
components.dwPasswordLength  = DWORD(-1);
components.dwUrlPathLength   = DWORD(-1);
components.dwExtraInfoLength = DWORD(-1);

if (!InternetCrackUrl(url, url.Length, 0, ref components)
    RaiseLastOSError();

String scheme   = StrLCopy(components.lpszScheme, components.dwSchemeLength);
String username = StrLCopy(components.lpszUserName, components.dwUserNameLength);
String password = StrLCopy(components.lpszPassword, components.dwPasswordLength);
String host     = StrLCopy(components.lpszHostName, components.dwHostNameLength);
Int32  port     = components.nPort;
String path     = StrLCopy(components.lpszUrlPath, components.dwUrlPathLength);
String extra    = StrLCopy(components.lpszExtraInfo, components.dwExtraInfoLength);

This means that

stackoverflow://iboyd:[email protected]:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose

is parsed into:

  • Scheme: stackoverflow
  • Username: iboyd
  • Password: password01
  • Host: mail.stackoverflow.com
  • Port: 12386
  • Path: /questions/SubmitQuestion.aspx
  • ExtraInfo: ?useLiveData=1&internal=0#nose

Parsing ExtraInfo into query and fragment

It sucks that InternetCrackUrl doesn't make a distinction between:

?query#fragment

and just mashes them together as ExtraInfo:

  • ExtraInfo: ?useLiveData=1&internal=0#nose
    • Query: ?useLiveData=1&internal=0
    • Fragment: #nose

So we have to do some splitting if we want the ?query or the #fragment:

/*
   InternetCrackUrl returns ?query#fragment in a single combined extraInfo field.
   Split that into separate
      ?query
      #fragment
*/
String query = extraInfo;
String fragment = "";

Int32 n = StrPos("#", extraInfo);
if (n >= 1) //one-based string indexes
{
   query = extraInfo.SubString(1, n-1);
   fragment = extraInfo.SubString(n, MaxInt);
}

Giving us the final desired:

  • Scheme: stackoverflow
  • Username: iboyd
  • Password: password01
  • Host: mail.stackoverflow.com
  • Port: 12386
  • Path: /questions/SubmitQuestion.aspx
  • ExtraInfo: ?useLiveData=1&internal=0#nose
    • Query: ?useLiveData=1&internal=0
    • Fragment: #nose

Upvotes: 2

Related Questions