Reputation: 1637
I want to crawl some web pages, like the following
http://www.youtube.com/user/koglin66/feed?filter=2
but there is a 'load more' button, it is related to an ajax request
http://www.youtube.com/channel_ajax?action_load_more_feed_items=1&activity_view=1&paging=1352148528&channel_id=UCCw8aVnsIeu9S6OPQyaQ14g
I want to crawl the whole page. Manually, I have click on the button repeatedly until there is no more to load, by automation, how can I crawl the whole page? thanks!
Upvotes: 0
Views: 983
Reputation: 101
Yes, you can use Selenium IDE, or use other program/library with browser core to do click action. Like webkit, activex of IE.
And you can try FMiner http://www.fminer.com/, it can record and play human actions on browser to scrape data, but it's not free.
Upvotes: 1
Reputation: 1
I recently faced same problem with other website I wanted to scrap. I use Java and after some research on the web I used Selenium IDE for firefox in which u can write Java Junit test cases which will automatically open the webpage and click buttons, fill up forms, etc. It also supports C#,Python,Ruby,etc
I used it to click on Load More button and when the page was loaded completely after all clicks I saved it Manually.
You can download Selenium from their website and I found this youtube video useful too http://www.youtube.com/watch?v=twdDfDOrHC4
Upvotes: 0