Costantin
Costantin

Reputation: 2656

Get base url from string with Regex and Javascript

I'm trying to get the base url from a string (So no window.location).

In other words all the following should return https://apple.com or https://www.apple.com for the last one.

These are just examples, urls can have different subdomains like https://shop.apple.co.uk/?query=foo should return https://shop.apple.co.uk - It could be any url like: https://foo.bar

The closer I got is with:

const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash

But this doesn't work with anchor links and queries which start right after the url without the / before

Any idea how I can get it to work on all cases?

Upvotes: 2

Views: 5007

Answers (4)

Matt Morgan
Matt Morgan

Reputation: 5303

You could use Web API's built-in URL for this. URL will also provide you with other parsed properties that are easy to get to, like the query string params, the protocol, etc.

Regex is a painful way to do something that the browser makes otherwise very simple.

I know that you asked about using regex, but in the event that you (or someone coming here in the future) really just cares about getting the information out and isn't committed to using regex, maybe this answer will help.

let one = "https://apple.com?query=true&slash=false"
let two = "https://apple.com#anchor=true&slash=false"
let three = "http://www.apple.com/#anchor=true&slash=true&whatever=foo"

let urlOne = new URL(one)
console.log(urlOne.origin)

let urlTwo = new URL(two)
console.log(urlTwo.origin)

let urlThree = new URL(three)
console.log(urlThree.origin)

Upvotes: 4

user875234
user875234

Reputation: 2517

This will get you everything up to the .com part. You will have to append .com once you pull out the first part of the url.

^http.*?(?=\.com)

Or maybe you could do:

myUrl.Replace(/(#|\?|\/#).*$/, "")

To remove everything after the host name.

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163207

You could add # and ? to your negated character class. You don't need .* because that will match until the end of the string.

For your example data, you could match:

^https?:\/\/[^#?\/]+

Regex demo

strings = [
"https://apple.com?query=true&slash=false",
    "https://apple.com#anchor=true&slash=false",
    "http://www.apple.com/#anchor=true&slash=true&whatever=foo",
    "https://foo.bar/?q=true"
];

strings.forEach(s => {
    console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})

Upvotes: 4

sui
sui

Reputation: 781

    const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');

Upvotes: 0

Related Questions