user9371654
user9371654

Reputation: 2408

extract the domain name using regex in javascript

I have a list of domain names, e.g. developer.mozilla.org. I need to extract the domain name only, e.g. mozilla.org. I used RegExp but did not get it right so far. Not sure what am I missing.

I wrote this javascript which does not capture the part I want exactly.

var arr = ["developer.mozilla.org", "cdn.mdn.mozilla.net", "www.google-analytics.com", "www.youtube.com"];
var arrLength = arr.length;
var reg = new RegExp('((\\.[a-zA-Z0-9]+)(\\.[a-zA-Z0-9]+))$');

for (i=0; i< arrLength; i++)
{
    console.log(arr[i].match(reg))
}

Upvotes: 0

Views: 153

Answers (3)

James Wilkins
James Wilkins

Reputation: 7388

You don't need a regex for this simple task.

var arr = ["developer.mozilla.org", "cdn.mdn.mozilla.net", "www.google-analytics.com", "www.youtube.com"];
var arrLength = arr.length;
for (var i = 0; i < arrLength; i++)
{
    var parts = arr[i].split('.');
    var domain = parts.slice(-2).join('.');
    console.log(domain);
}

or a much shorter version:

for (var i = 0; i < arr.length; i++)
{
    var domainName = arr[i].split('.').slice(-2).join('.');
    console.log(domainName);
}

slice(-2) extracts the last two elements in an array sequence.

Upvotes: 0

Thad
Thad

Reputation: 975

\w will pick up underscore and hyphen. substring(1) on the first element so you don't print the first dot. :)

let arr = ["developer.mozilla.org", "cdn.mdn.mozilla.net", 
    "www.google-analytics.com", "www.youtube.com"];
let expr = /(\.[\/\w\.-]+)(\.[a-zA-Z0-9]+)/;
let regex = new RegExp(expr);

arr.forEach(e => console.log(e.match(regex)[0].substring(1)));

Upvotes: 0

sudavid4
sudavid4

Reputation: 1141

It works if you write your code like this:

var arr = ["developer.mozilla.org", "cdn.mdn.mozilla.net", "www.google-analytics.com", "www.youtube.com"];
var arrLength = arr.length;
var reg = /[^.]+\.[^.]+$/

for (i=0; i< arrLength; i++)
{
    console.log(arr[i].match(reg)[0])
}

Some explanations:

First of all there is a flaw in your regex that causes the 'google-analytics' entry to be missed. I would likely suggest that you write your regex like this instead

var reg = /[^.]+\.[^.]+$/

The regex you wrote has 2 capturing groups, this explains the arrays you are getting from your console.log

['.mozilla.org', '.mozilla', '.org'] = [matching string, capturedGroup1, capturedGroup2]

you could make your groups non-capturing by writing your regex like so:

var reg = new RegExp('(?:(?:\\.[a-zA-Z0-9]+)(?:\\.[a-zA-Z0-9]+))$');

or using a regex literal as @Bergi suggests

var reg = /(?:(?:\.[a-zA-Z0-9]+)(?:\.[a-zA-Z0-9]+))$/

in any case when you're using the match method you'll get an array in return and what you're really interested in is the matched string, so the first element in the array. You'd get the expected result by rewriting the body of the loop like this

console.log((arr[i].match(reg) || [])[0]) // note I'm concerned with string.match returning null here

If you really dislike the array you could use string replace instead

console.log(arr[i].replace(/^.*\.([^.]+\.[^.]+)$/, '$1'))

Upvotes: 1

Related Questions