Victor
Victor

Reputation: 14593

How to normalize a URL?

I am dealing with a situation where I need users to enter various URLs (for example: for their profiles). However, users do not always insert URLs in the https://example.com format. They might insert something like:

How can I normalize the URLs to a format that can potentially lead to a web address? I see this behavior in web browsers. We almost always enter crappy things in a web browser's bar and they can distinguish whether that's a search or something that can be turned into a URL.

I tried looking in many places but seems like I can't find any approach to this.

I would prefer a solution written for Node if it's possible. Thank you very much!

Upvotes: 10

Views: 22063

Answers (3)

mpen
mpen

Reputation: 282885

Here's an easy way:

function normalizeUrl(url: string): string {
    const urlObj = new URL(url, window.location.href)
    urlObj.searchParams.sort()
    return urlObj.href
}
normalizeUrl('foo?b=2&a=1#bar')  // 'http://localhost:5173/foo?a=1&b=2#bar'

I've sorted the query params because usually the order of those doesn't matter. Relative URLs will be expanded.

If you're using Node.js, you might want to use import.meta.url instead of window.location.href, or delete the arg and just make sure you always pass fully qualified URLs.

Upvotes: 1

Sindre Sorhus
Sindre Sorhus

Reputation: 63487

You want the normalize-url package:

const normalizeUrl = require('normalize-url');

normalizeUrl('example.com/');
//=> 'http://example.com'

It runs a bunch of normalizations on the URL.

Upvotes: 5

nowy
nowy

Reputation: 394

Use node's URL API, alongside some manual checks.

  1. Manually check that the URL has a valid protocol.
  2. Instantiate the URL.
  3. Check that the URL does not contain additional information.

Example code:

const { URL } = require('url')
let myTestUrl = 'https://user:[email protected]:8080/p/a/t/h?query=string#hash';

try {
  if (!myTestUrl.startsWith('https://') && !myTestUrl.startsWith('http://')) {
    // The following line is based on the assumption that the URL will resolve using https.
    // Ideally, after all checks pass, the URL should be pinged to verify the correct protocol.
    // Better yet, it should need to be provided by the user - there are nice UX techniques to address this.
    myTestUrl = `https://${myTestUrl}`
  }

  const normalizedUrl = new URL(myTestUrl);

  if (normalizedUrl.username !== '' || normalized.password !== '') {
    throw new Error('Username and password not allowed.')
  }

  // Do your thing
} catch (e) {
  console.error('Invalid url provided', e)
}

I have only used http and https in this example, for a gist.

Straight from the docs, a nice visualisation of the API:

┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            href                                             │
├──────────┬──┬─────────────────────┬─────────────────────┬───────────────────────────┬───────┤
│ protocol │  │        auth         │        host         │           path            │ hash  │
│          │  │                     ├──────────────┬──────┼──────────┬────────────────┤       │
│          │  │                     │   hostname   │ port │ pathname │     search     │       │
│          │  │                     │              │      │          ├─┬──────────────┤       │
│          │  │                     │              │      │          │ │    query     │       │
"  https:   //    user   :   pass   @ sub.host.com : 8080   /p/a/t/h  ?  query=string   #hash "
│          │  │          │          │   hostname   │ port │          │                │       │
│          │  │          │          ├──────────────┴──────┤          │                │       │
│ protocol │  │ username │ password │        host         │          │                │       │
├──────────┴──┼──────────┴──────────┼─────────────────────┤          │                │       │
│   origin    │                     │       origin        │ pathname │     search     │ hash  │
├─────────────┴─────────────────────┴─────────────────────┴──────────┴────────────────┴───────┤
│                                            href                                             │
└─────────────────────────────────────────────────────────────────────────────────────────────┘

Upvotes: 9

Related Questions