angelokh
angelokh

Reputation: 9428

Get similar links from one site using wget

I have a site (http://a-site.com) with many links like that. How can I use wget to crawl and grep all similar links to a file?

<a href="/user/333333/follow_user" class="btn" rel="nofollow">Follow</a>

I tried this but this command only get me all similar links on one page but not recursively follow other links to find similar links.

$ wget -erobots=off --no-verbose -r --quiet -O - http://a-site.com 2>&1 | \
  grep -o '['"'"'"][^"'"'"']*/follow_user['"'"'"]'

Upvotes: 0

Views: 133

Answers (1)

Skippy le Grand Gourou
Skippy le Grand Gourou

Reputation: 7704

You may want to use the --accept-regex option of wget rather than piping through grep :

wget -r --accept-regex '['"'"'"][^"'"'"']*/follow_user['"'"'"]' http://a-site.com

(not tested, the regex may need adjustment or specification of --regex-type (see man wget), and of course add other options you find useful).

Upvotes: 1

Related Questions