Reputation: 265
I am stuck at a rather simple problem - removing duplicate domains from a list of URL's, using javascript.
Here's what I am currently doing: I have an array called 'list' which has the list of url's. I work on that to extract the domains, and put them in a new array called 'domain'.
Then I use two for loops to go through the entire list and check for duplicate domains. If the domains match, I splice the duplicate one out. But it seems to be removing too many, and I am pretty sure I am doing something wrong. Can somebody tell me what I am doing wrong, or suggest a simpler/better way of doing it?
for (i=0; i<list.length; i++) {
for (j=i+1; j<list.length; j++) {
if (domain[i] == domain[j]) {
console.log('REMOVING:');
console.log(i + '. ' + list2[i]);
console.log(j + '. ' + list2[j]);
console.log(domain[i]);
console.log(domain[j]);
list.splice(j,1);
}
}
}
This is not a 'how to remove duplicates from an array' question. As I have a list of URL's, and need to check for - and remove, only the duplicate 'domains'. So suppose I have 4 URL's from youtube, I need to keep only the first one and remove the rest.
Upvotes: 1
Views: 1790
Reputation: 13557
Try to get rid of the domains array. Instead build a map of already "used" domains:
var urls = [
'http://example.org/page-1.html',
'http://example.org/page-2.html',
'http://google.com/search.html',
'http://mozilla.com/foo.html',
];
var domains = {};
var uniqueUrls = urls.filter(function(url) {
// whatever function you're using to parse URLs
var domain = extractDomain(url);
if (domains[domain]) {
// we have seen this domain before, so ignore the URL
return false;
}
// mark domain, retain URL
domains[domain] = true;
return true;
});
console.log(uniqueUrls);
Upvotes: 2
Reputation: 8131
If you are able to use the Undescore.js library, it's as simple as
yourArray = _.uniq(yourArray);
Upvotes: 0
Reputation: 1219
The best way to remove duplicates is to use a map. The example has an array of URIs with some duplicates. First insert the strings into an object, then iterate over the object to create an array. Boom, no duplicates.
function getHostName(url) {
var match = url.match(/:\/\/(www[0-9]?\.)?(.[^/:]+)/i);
if (match != null && match.length > 2 && typeof match[2] === 'string' && match[2].length > 0) {
return match[2];
}
else {
return null;
}
}
var uris = ["http://foo.org/barbar","http://www.bar.com/foo/bar/bar.html","http://foo.bar/lorem/","http://foo.org","https://bar.bar","http://foo.org","http://bar.bar"];
var urisObj = {};
for(var i = 0;i<uris.length;i++){
urisObj[getHostName(uris[i])] = getHostName(uris[i]);
}
uris = Object.keys(urisObj).map(function(x) { return urisObj[x];});
console.log(uris);
Edit:
Using http://www.primaryobjects.com/2012/11/19/parsing-hostname-and-domain-from-a-url-with-javascript/ to get the host name from a string.
Upvotes: 0
Reputation: 8131
If you want to do it using your original way (or very similar to it), instead of going up the array (with i++
) - go down the array instead. As in the following code,
var list = ["abc", "cba", "abc", "abc", "abc", "abc"];
for (var i = list.length - 1; i >= 0; i--) {
for (var j = i-1; j >= 0; j--) {
if (list[i] == list[j]) {
console.log('REMOVING:');
console.log(i + '. ' + list[i]);
console.log(j + '. ' + list[j]);
console.log(list[i]);
console.log(list[j]);
list.splice(i, 1);
}
}
}
console.log(list);
Upvotes: 0
Reputation: 115508
You can let an object handle the checking for you.
var a = [];
a.push('http://test')
a.push('http://that');
a.push('http://that');
a.push('http://that');
var o = {}
for(var ii = 0; ii < a.length; ii++){
o[a[ii]] = o[a[ii]]
}
var nA = [];
for (var k in o) {
nA.push(k);
}
Upvotes: 0
Reputation: 36511
ES5: filter the array and only include if the current item's index is equal to its index in the array:
list.filter(function(elem, pos, arr) {
return arr.indexOf(elem) === pos;
});
ES6: use a Set
const uniqueDomains = [ ...new Set(list) ];
or if you can't use the spread operator:
new Set(list).toJSON()
Upvotes: 3