Cjoerg
Cjoerg

Reputation: 1325

Triggering ajax requests/responses before parsing a webpage with Mechanize/Nokogiri

I am parsing through a website that contains buyers feedback of customers. I want to collect the name of each buyer and the feedback he or she has given.

My issue is that only a few feedbacks are given on the first page. The next page is triggered by clicking a button, and the website responds with AJAX. How do I get the new feedbacks from the AJAX response into my Mechanize page object? I want to click the AJAX trigger button as many times as possible, so I get as many feedbacks as there are available.

My code looks like this:

require 'mechanize'
require 'nokogiri'

log_file = "log_file.txt"
log = File.open(log_file, 'w')

www = "http://www.trustpilot.dk/review/www.fona.dk"

agent = Mechanize.new
page = agent.get(www)
reviews = page.search(".clear")

reviews.each do |r|
  doc = Nokogiri::HTML::Document.parse(r.to_html)

  log << "####################### NEW REVIEW #######################\n\n"

  name = r.at_css(".profileinfo a").text.strip
  log << "Customer name: #{name}\n"

  rating = doc.at("//meta[@itemprop = 'ratingValue']/@content").to_s
  log << "Rating: #{rating}\n\n"
end

log.close

The log file fyi will look like this:

####################### NEW REVIEW #######################

Customer name: Hans-Oluf
Rating: 5

####################### NEW REVIEW #######################

Customer name: Jørgen
Rating: 3

####################### NEW REVIEW #######################

Customer name: Frederik
Rating: 4

The AJAX trigger should be in this peice of source code:

                <div id="AjaxLoader_1" class="AjaxPager">
    <div class="AjaxPagerLinkWrapper">
        <a class="button AjaxPagerLink" href="http://www.trustpilot.dk/review/www.fona.dk?page=2">
            Vis flere anmeldelser
        </a>
    </div>
</div>
<script type="text/javascript">
    $(document).ready (function() {
        // Testing spilttest console.log("/domains/reviews?DID=767");
        // Get element right before this control
        var containerId = 'reviewContainer';
        var container = containerId == '' 
            ? $('#AjaxLoader_1').prev()
            : $($.f('#{0}', containerId));
        var pager = new Pager(
            1,
            25,
            'nextPageLoaded',
            'AjaxLoader_1',
            '/domains/reviews?DID=767',
            'page',
            '',
            container);
        });
</script>

Upvotes: 0

Views: 1375

Answers (1)

pguardiario
pguardiario

Reputation: 54984

Easy. You would just keep making GET requests to:

page = agent.get "http://www.trustpilot.dk/domains/reviews?DID=767&page=#{increment me}"

until there's no more data.

Upvotes: 2

Related Questions