user3591111
user3591111

Reputation: 102

Jsoup get text from website

I already can navigate in the site and get all the links that i want. But my main objective is getting the commentary of the hotels. The site i am using is this http://www.booking.com/hotel/pt/park-italia-flat.pt-pt.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaLsBiAEBmAEvuAEEyAEE2AEB6AEB-AEL;sid=637e7af0c3009aa9ea132a960e2d2d40;dcid=4;ucfs=1;room1=A,A;srfid=b8260a1c264a3873291a9061733a43536a4d35c2X979#tab-reviews I can get where using jsoup no problem but now i dont know how to get the text. I already tried getElementsByTag and getTextand other solutions. Can this be done with jsoup or i need another library. I am trying this way to get the text. But the text that appears is not what i want.

        Document doc ;
        try {
            doc = Jsoup.connect(pair.getValue().toString() + "#tab-reviews").get();
            Elements scriptElements = doc.getElementsMatchingText("span");
            for (Element link : scriptElements ) {
                System.out.printf(" Text: <%s> \n", link.text());
            }

        } catch (IOException ex) {
            Logger.getLogger(GetComentsThread.class.getName()).log(Level.SEVERE, null, ex);
        }

For getting the urls i using something like this.

Pattern pattern = Pattern.compile("src=destinationfinder");
            Document doc = Jsoup.connect(url).get();
            Elements links = doc.select("a[href]");
            for (Element link : links) {
                Matcher matcher = pattern.matcher(link.attr("abs:href"));
                if (matcher.find()) {
                    dest = link.attr("abs:href");
                    break;
                }
            }

Now i can get some reviews but only the positive dont know why

doc = Jsoup.connect(pair.getValue().toString() + "#tab-reviews").get();
                    //doc = Jsoup.connect("http://www.booking.com/hotel/pt/pestanaportohotel.pt-pt.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaLsBiAEBmAEvuAEEyAEE2AEB6AEB-AEL;sid=cff2dddd95e71c0768847a554584c888;dcid=4;dist=0;group_adults=2;room1=A%2CA;sb_price_type=total;srfid=798bd6b01ea1dba53ee6b6b945dda1f623859730X2;type=total;ucfs=1&#tab-reviews").get();
                    String teste="p.trackit";


                    Elements scriptElements = doc.select(teste);
                    for (Element link : scriptElements) {

                        //System.out.printf(" Text: <%s> ...%s\n", link.text(),link.attr("class=\"review_pos\""));
                        System.out.printf(" Text: <> ...%s\n",link.text());

                    }

Upvotes: 0

Views: 205

Answers (2)

Davide Pastore
Davide Pastore

Reputation: 8738

Reviews are loaded using an AJAX request to another url.

There you can get all the info you need.

Response:

<li class="
  review_item
  clearfix
  ">
  <p class="review_item_date">
    16 de Setembro de 2015
  </p>
  <div class="review_item_reviewer">
    <h4>
      Beatriz
    </h4>
    <span class="reviewer_country">
    <span class="reviewer_country_flag sflag slang-br">
    </span>
    Brasil
    </span>
  </div>
  <!-- .review_item_reviewer -->
  <div class="review_item_review">
    <div class="
      review_item_review_container
      lang_ltr
      seo_reviews_item
      ">
      <div class="review_item_review_header">
        <div class="
          review_item_header_score_container
          ">
          <div class="review_item_review_score jq_tooltip high_score_tooltip" title="
            Excepcional
            ">
            9,6
          </div>
        </div>
        <div class="review_item_header_content_container">
          <div class="review_item_header_content seo_review_title">
            Excepcional
          </div>
        </div>
      </div>
      <ul class="review_item_info_tags">
        <li class="review_info_tag"><span class="bullet">&bull;</span> Viagem de lazer</li>
        <li class="review_info_tag"><span class="bullet">&bull;</span> Família</li>
        <li class="review_info_tag"><span class="bullet">&bull;</span> Apartamento com Varanda</li>
        <li class="review_info_tag"><span class="bullet">&bull;</span> Ficou 5 noites</li>
        <li class="review_info_tag"><span class="bullet">&bull;</span> Submetido através de dispositivo móvel</li>
      </ul>
      <div class="review_item_review_content">
        <p class="review_pos"><i class="review_item_icon">&#45575;</i>Conforto, perto do centro, perto de um lindo mercado, bem decorado, com todo material necessário para fazer as refeições, Wi-Fi excelente</p>
      </div>
    </div>
  </div>
</li>

Upvotes: 1

Gugg
Gugg

Reputation: 250

looks like you just need to use jsoup to get content from class="review_pos" and class="review_neg"

Upvotes: 0

Related Questions