Reputation:
I am trying to build a url cleaner.
I am looking to get a list of urls and remove all https://
, http://
, www.
, etc. from the beginning as well as all text after the trailing /
.
I have tried the following regex url.replace(/^https?\:\/\/www\./i, "").split('/')[0];
This works to a certain extent and outputs the following
"www.net-temps.com"
"www.toplanguagejobs.com"
"http:"
"peopleready.com"
"nationjob.com"
"http:"
"bluesteps.com"
"https:"
"theguardian.com"
"reddit.com"
"youtube.com"
"https:"
"pgatour.com"
"cultofmac.com"
from the following list:
'www.net-temps.com',
'www.toplanguagejobs.com',
'http://nychires.com/',
'http://www.peopleready.com/',
'https://www.nationjob.com/',
'http://nationaljobsonline.com/',
'https://www.bluesteps.com/',
'https://medium.freecodecamp.com/how-we-got-our-2-year-old-open-source-project-to-trend-on-github-8c25b0a6dfe9#.nl4985bjz',
'https://www.theguardian.com/uk/business',
'https://www.reddit.com/r/funny/comments/5qzkz4/my_captain_friend_sent_me_this_photo_saudi_prince/',
'https://www.youtube.com/watch?v=Bua8k_CcnuI',
'https://stackoverflow.com/questions/7000995/jquery-removing-part-of-string-after-and-removing-too/7001040#7001040',
'http://www.pgatour.com/fantasy.html',
'http://www.cultofmac.com/464645/apple-spaceship-campus-flyover/'
If I remove the /www\.
from the regex this works well and removes all https:
etc., but I'd also like to remove the www.
if it's there regardless of https:
This is what i have coded so far
https://jsfiddle.net/xba5x9ro/1/
In the future once this is sorted. I would like to take a list of urls from a text area run makeDomainBeautiful
and output to another textarea but thought I'd get this working first.
Upvotes: 11
Views: 23213
Reputation: 1079
Don't use a regular expression here; there is no need, and it opens up the possibility of making mistakes. Use javascript's native URL constructor instead:
let url = new URL('https://stackoverflow.com/questions/7000995/jquery-removing-part-of-string-after-and-removing-too/7001040#7001040');
let hostname = url.hostname;
console.log(hostname); // stackoverflow.com
https://developer.mozilla.org/en-US/docs/Web/API/URL/URL
Addendum: To remove the www subdomain specifically, you can do this:
let hostname_without_www = hostname.replace(/^www\./, "")
Upvotes: 2
Reputation: 1
You can remove last '/' and trailing string without using split function. Just modify the Regex a bit.
const url = 'https://example.com/';
const regex = /^(?:https?:\/\/)?(?:www\.)?|\/.*$/gi;
console.log(url.replace(regex, ""));
Upvotes: -1
Reputation: 7
var url = prompt("url: ");
url = url.replace(/^(?:https?:\/\/)?(?:www\.)?/i, "").split('/')[0];
alert("url: " + url);
Upvotes: -2
Reputation: 641
This will take care of http
, https
and www
url.replace(/^(?:https?:\/\/)?(?:www\.)?/i, "").split('/')[0]
Upvotes: 0
Reputation: 828
Based on ibrahim mahrir answer, if you just want to trim the http or https and www from the start of the URL, but keep the rest. Mocked it up in codepen to test if it works. Seems to work nicely. https://codepen.io/pureth/pen/LQOaPz
var regex = /^(?:https?:\/\/)?(?:www\.)?/i;
var urlList = [
"www.net-temps.com",
"www.toplanguagejobs.com",
"http://nychires.com/",
"http://www.peopleready.com/",
"https://www.nationjob.com/",
"http://nationaljobsonline.com/",
"https://www.bluesteps.com/",
"https://medium.freecodecamp.com/how-we-got-our-2-year-old-open-source-project-to-trend-on-github-8c25b0a6dfe9#.nl4985bjz",
"https://www.theguardian.com/uk/business",
"https://www.reddit.com/r/funny/comments/5qzkz4/my_captain_friend_sent_me_this_photo_saudi_prince/",
"https://www.youtube.com/watch?v=Bua8k_CcnuI",
"https://stackoverflow.com/questions/7000995/jquery-removing-part-of-string-after-and-removing-too/7001040#7001040",
"http://www.pgatour.com/fantasy.html",
"http://www.cultofmac.com/464645/apple-spaceship-campus-flyover/"
];
urlList.forEach(function(url) {
let $originalEl = $("<div class='url'>" + url + "</div>"),
cleanUrl = url.replace(regex, ""),
$cleanEl = $("<div class='url'>" + cleanUrl + "</div>");
$(".original").append($originalEl);
$(".clean").append($cleanEl);
});
.original, .clean {
background-color: grey;
width: 25%;
max-width: 350px;
float: left;
}
.title {
color: white;
text-align: center;
padding-top: 3px;
}
.url {
background-color: lightgrey;
margin: 5px;
word-wrap:break-word;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="original">
<div class="title"><b>original</b></div>
</div>
<div class="clean">
<div class="title"><b>clean</b></div>
</div>
Upvotes: -2
Reputation: 31692
/^(?:https?:\/\/)?(?:www\.)?/i
where both https://
and www.
should be optional (?
) and non-capturing groups ((?:...)
).
var url = prompt("url: ");
url = url.replace(/^(?:https?:\/\/)?(?:www\.)?/i, "").split('/')[0];
alert("url: " + url);
Upvotes: 49