Reputation: 157
Here is a sample block of code I need to scrape:
<p>This paragraph contains <a href="http://twitter.com/chsweb" data-placement="below" rel="twipsy" target="_blank" data-original-title="Twitter">links to Twitter folks</a>, and <a href="http://twitter.com/blogcycle" data-placement="below" rel="twipsy" target="_blank" data-original-title="Twitter">more links to other Twitter folks</a>, but it also contains <a href="http://www.someOtherWebsiteHere.com">non-Twitter links too</a>. How can I list only the Twitter links below?</p>
This script generates a list of every URL on the page:
<script>
var allLinks = document.links;
for (var i=0; i<allLinks.length; i++) {
document.write(allLinks[i].href+"<BR/>");
}
</script>
How do I modify the script so that it only lists URLs that contain a certain domain, e.g.; twitter.com/?
Here is a demo page: http://chsweb.me/OucTum
Upvotes: 1
Views: 3213
Reputation: 13864
ORIGINAL: Not working on demo page (Sample 6)
<script>
if (allLinks[i].href.match("twitter\.com"))
{
document.write(allLinks[i].href+"<BR/>");
}
</script>
REVISED: Is working on demo page (Sample 7)
<script>
var allLinks = document.links;
for (var i=0; i<allLinks.length; i++) {
if (allLinks[i].href.match("twitter.com")) {
document.write(allLinks[i].href+"<BR/>");
}
}
</script>
Upvotes: 0
Reputation: 123397
On modern browser you could easily retrieve all desired links with
var twitter_links = document.querySelectorAll('a[href*="twitter.com"]');
using .querySelectorAll()
is a bit penalizing in terms of speed, but probably you won't notice any significative difference and it will make code easier to read and shorter than using a for
loop with a regular expression.
Upvotes: 1
Reputation: 108500
You can use window.location
properties on the link element to extract different parts of the href. f.ex:
var link = allLinks[i];
if ( /twitter\.com/.test( link.hostname ) ) {
document.write(link.href+"<BR/>");
}
Another issue with your code: If you use document.write
in a for loop, it will effectively empty the collection of links, since they are just a reference to the links present in the current document. So it will never get past the first link. Collect them in an array instead:
var links = [];
for (var i=0; i<allLinks.length; i++) {
var link = allLinks[i];
if ( /twitter\.com/.test( link.hostname ) ) {
links.push(link.href);
}
}
document.write(links.join('<br>'));
Demo: http://jsfiddle.net/3xub6/
Upvotes: 0
Reputation: 53198
The following will place all Twitter links in the twitter_links
array:
var twitter_links = [ ],
links = document.getElementsByTagName('a');
for(var i in links)
{
if(/twitter.com/i.exec(links[i].href))
{
twitter_links.push(links[i]);
}
}
Here's a jsFiddle for you > http://jsfiddle.net/Pv8DH/
Upvotes: 0