user2734006
user2734006

Reputation:

removing http:// or http:// and www

I am trying to build a url cleaner.

I am looking to get a list of urls and remove all https://, http://, www., etc. from the beginning as well as all text after the trailing /.

I have tried the following regex url.replace(/^https?\:\/\/www\./i, "").split('/')[0];

This works to a certain extent and outputs the following

"www.net-temps.com"
"www.toplanguagejobs.com"
"http:"
"peopleready.com"
"nationjob.com"
"http:"
"bluesteps.com"
"https:"
"theguardian.com"
"reddit.com"
"youtube.com"
"https:"
"pgatour.com"
"cultofmac.com"

from the following list:

'www.net-temps.com',
'www.toplanguagejobs.com',
'http://nychires.com/',
'http://www.peopleready.com/',
'https://www.nationjob.com/',
'http://nationaljobsonline.com/',
'https://www.bluesteps.com/',
'https://medium.freecodecamp.com/how-we-got-our-2-year-old-open-source-project-to-trend-on-github-8c25b0a6dfe9#.nl4985bjz',
'https://www.theguardian.com/uk/business',
'https://www.reddit.com/r/funny/comments/5qzkz4/my_captain_friend_sent_me_this_photo_saudi_prince/',
'https://www.youtube.com/watch?v=Bua8k_CcnuI',
'https://stackoverflow.com/questions/7000995/jquery-removing-part-of-string-after-and-removing-too/7001040#7001040',
'http://www.pgatour.com/fantasy.html',
'http://www.cultofmac.com/464645/apple-spaceship-campus-flyover/'

If I remove the /www\. from the regex this works well and removes all https: etc., but I'd also like to remove the www. if it's there regardless of https:

This is what i have coded so far

https://jsfiddle.net/xba5x9ro/1/

In the future once this is sorted. I would like to take a list of urls from a text area run makeDomainBeautiful and output to another textarea but thought I'd get this working first.

Upvotes: 11

Views: 23213

Answers (6)

tklodd
tklodd

Reputation: 1079

Don't use a regular expression here; there is no need, and it opens up the possibility of making mistakes. Use javascript's native URL constructor instead:

let url = new URL('https://stackoverflow.com/questions/7000995/jquery-removing-part-of-string-after-and-removing-too/7001040#7001040');
let hostname = url.hostname;
console.log(hostname); // stackoverflow.com

https://developer.mozilla.org/en-US/docs/Web/API/URL/URL

Addendum: To remove the www subdomain specifically, you can do this:

let hostname_without_www = hostname.replace(/^www\./, "")

Upvotes: 2

RPJaisawal
RPJaisawal

Reputation: 1

You can remove last '/' and trailing string without using split function. Just modify the Regex a bit.

const url = 'https://example.com/';
const regex = /^(?:https?:\/\/)?(?:www\.)?|\/.*$/gi;
console.log(url.replace(regex, ""));

Upvotes: -1

Hitesh Limbani
Hitesh Limbani

Reputation: 7

var url = prompt("url: ");

url = url.replace(/^(?:https?:\/\/)?(?:www\.)?/i, "").split('/')[0];

alert("url: " + url);

Upvotes: -2

SirPhemmiey
SirPhemmiey

Reputation: 641

This will take care of http, https and www

url.replace(/^(?:https?:\/\/)?(?:www\.)?/i, "").split('/')[0]

Upvotes: 0

pureth
pureth

Reputation: 828

Based on ibrahim mahrir answer, if you just want to trim the http or https and www from the start of the URL, but keep the rest. Mocked it up in codepen to test if it works. Seems to work nicely. https://codepen.io/pureth/pen/LQOaPz

var regex = /^(?:https?:\/\/)?(?:www\.)?/i;
var urlList = [
  "www.net-temps.com",
  "www.toplanguagejobs.com",
  "http://nychires.com/",
  "http://www.peopleready.com/",
  "https://www.nationjob.com/",
  "http://nationaljobsonline.com/",
  "https://www.bluesteps.com/",
  "https://medium.freecodecamp.com/how-we-got-our-2-year-old-open-source-project-to-trend-on-github-8c25b0a6dfe9#.nl4985bjz",
  "https://www.theguardian.com/uk/business",
  "https://www.reddit.com/r/funny/comments/5qzkz4/my_captain_friend_sent_me_this_photo_saudi_prince/",
  "https://www.youtube.com/watch?v=Bua8k_CcnuI",
  "https://stackoverflow.com/questions/7000995/jquery-removing-part-of-string-after-and-removing-too/7001040#7001040",
  "http://www.pgatour.com/fantasy.html",
  "http://www.cultofmac.com/464645/apple-spaceship-campus-flyover/"
];

urlList.forEach(function(url) {
  let $originalEl = $("<div class='url'>" + url + "</div>"),
    cleanUrl = url.replace(regex, ""),
    $cleanEl = $("<div class='url'>" + cleanUrl + "</div>");
  $(".original").append($originalEl);
  $(".clean").append($cleanEl);
});
.original, .clean {
  background-color: grey;
  width: 25%;
  max-width: 350px;
  float: left;
}
.title {
  color: white;
  text-align: center;
  padding-top: 3px;
}
.url {
  background-color: lightgrey;
  margin: 5px;
  word-wrap:break-word;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="original">
  <div class="title"><b>original</b></div>
</div>
<div class="clean">
  <div class="title"><b>clean</b></div>
</div>

Upvotes: -2

ibrahim mahrir
ibrahim mahrir

Reputation: 31692

/^(?:https?:\/\/)?(?:www\.)?/i where both https:// and www. should be optional (?) and non-capturing groups ((?:...)).

var url = prompt("url: ");

url = url.replace(/^(?:https?:\/\/)?(?:www\.)?/i, "").split('/')[0];

alert("url: " + url);

Upvotes: 49

Related Questions