IronWorkshop
IronWorkshop

Reputation: 75

Use jsoup to extract text from 'form' class with variable page data

First post here so i'll do my best to keep this specific. I have been using Jsoup to extract data from a host of web pages to bring into a utitity app. I have come across a page which updates the data dynamically based on the users selection from a drop down box. I can see the data when I inspect the html in Chrome, however I cannot seem to extract it. I can extract all the text elements around it, but anything dynamically generated wont come out.

The page i'm looking at has the below form class, apologies for the wrapping, I couldn't get rid of it.

<form class="variations_form cart" method="post" enctype="multipart/form-data" data-product_id="8044" data-product_variations="[{&quot;variation_id&quot;:8047,&quot;variation_is_visible&quot;:true,&quot;variation_is_active&quot;:true,&quot;is_purchasable&quot;:true,&quot;display_price&quot;:19.70,&quot;display_regular_price&quot;:19.70,&quot;attributes&quot;:{&quot;attribute_size&quot;:&quot;500g&quot;},&quot;image_src&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-475x652.png&quot;,&quot;image_link&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann.png&quot;,&quot;image_title&quot;:&quot;LABELS_500g-FOOD Vann&quot;,&quot;image_alt&quot;:&quot;&quot;,&quot;image_srcset&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann.png 1063w&quot;,&quot;image_sizes&quot;:&quot;(max-width: 475px) 100vw, 475px&quot;,&quot;price_html&quot;:&quot;<span class=\&quot;price\&quot;><span class=\&quot;amount\&quot;>$19.70<\/span><\/span>&quot;,&quot;availability_html&quot;:&quot;&quot;,&quot;sku&quot;:&quot;FOOD-Vanilla-500&quot;,&quot;weight&quot;:&quot;.5 kg&quot;,&quot;dimensions&quot;:&quot;&quot;,&quot;min_qty&quot;:1,&quot;max_qty&quot;:&quot;&quot;,&quot;backorders_allowed&quot;:false,&quot;is_in_stock&quot;:true,&quot;is_downloadable&quot;:false,&quot;is_virtual&quot;:false,&quot;is_sold_individually&quot;:&quot;no&quot;,&quot;variation_description&quot;:&quot;<p>500g<\/p>\n&quot;},{&quot;variation_id&quot;:8045,&quot;variation_is_visible&quot;:true,&quot;variation_is_active&quot;:true,&quot;is_purchasable&quot;:true,&quot;display_price&quot;:13.50,&quot;display_regular_price&quot;:13.50,&quot;attributes&quot;:{&quot;attribute_size&quot;:&quot;1kg&quot;},&quot;image_src&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-475x652.png&quot;,&quot;image_link&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van.png&quot;,&quot;image_title&quot;:&quot;LABELS_1kg-FOOD Van&quot;,&quot;image_alt&quot;:&quot;&quot;,&quot;image_srcset&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van.png 1063w&quot;,&quot;image_sizes&quot;:&quot;(max-width: 475px) 100vw, 475px&quot;,&quot;price_html&quot;:&quot;<span class=\&quot;price\&quot;><span class=\&quot;amount\&quot;>$13.50<\/span><\/span>&quot;,&quot;availability_html&quot;:&quot;&quot;,&quot;sku&quot;:&quot;FOOD-Vanilla-1kg&quot;,&quot;weight&quot;:&quot;1 kg&quot;,&quot;dimensions&quot;:&quot;&quot;,&quot;min_qty&quot;:1,&quot;max_qty&quot;:&quot;&quot;,&quot;backorders_allowed&quot;:false,&quot;is_in_stock&quot;:true,&quot;is_downloadable&quot;:false,&quot;is_virtual&quot;:false,&quot;is_sold_individually&quot;:&quot;no&quot;,&quot;variation_description&quot;:&quot;<p>1kg<\/p>\n&quot;},{&quot;variation_id&quot;:8046,&quot;variation_is_visible&quot;:true,&quot;variation_is_active&quot;:true,&quot;is_purchasable&quot;:true,&quot;display_price&quot;:199.95,&quot;display_regular_price&quot;:199.95,&quot;attributes&quot;:{&quot;attribute_size&quot;:&quot;3kg&quot;},&quot;image_src&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-475x652.png&quot;,&quot;image_link&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van.png&quot;,&quot;image_title&quot;:&quot;LABELS_3kg-FOOD Van&quot;,&quot;image_alt&quot;:&quot;&quot;,&quot;image_srcset&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van.png 1063w&quot;,&quot;image_sizes&quot;:&quot;(max-width: 475px) 100vw, 475px&quot;,&quot;price_html&quot;:&quot;<span class=\&quot;price\&quot;><span class=\&quot;amount\&quot;>$199.95<\/span><\/span>&quot;,&quot;availability_html&quot;:&quot;&quot;,&quot;sku&quot;:&quot;FOOD-Vanilla-3kg&quot;,&quot;weight&quot;:&quot;3 kg&quot;,&quot;dimensions&quot;:&quot;&quot;,&quot;min_qty&quot;:1,&quot;max_qty&quot;:&quot;&quot;,&quot;backorders_allowed&quot;:false,&quot;is_in_stock&quot;:true,&quot;is_downloadable&quot;:false,&quot;is_virtual&quot;:false,&quot;is_sold_individually&quot;:&quot;no&quot;,&quot;variation_description&quot;:&quot;<p>3kg<\/p>\n&quot;}]">

  <table class="variations" cellspacing="0">
    <tbody>
      <tr>
        <td class="label">
          <label for="size">Size</label>
        </td>
        <td class="value">
          <select id="size" class="" name="attribute_size" data-attribute_name="attribute_size">
            <option value="">Choose an option</option>
            <option value="500g">500g</option>
            <option value="1kg" selected="selected">1kg</option>
            <option value="3kg">3kg</option>
          </select><a class="reset_variations" href="#" style="visibility: visible; display: block;">Clear selection</a>	
        </td>
      </tr>
    </tbody>
  </table>

  <div class="angelleye_buton_box_relative" style="position: relative;">

    <div class="single_variation_wrap">
      <div class="woocommerce-variation-description" style="border: 1px solid transparent;">
        <p>1kg</p>
      </div>
      <div class="single_variation"><span class="price"><span class="amount selectorgadget_selected">$13.50</span></span>
      </div>
      <div class="variations_button">
        <div class="quantity">
          <input type="number" step="1" name="quantity" value="1" title="Qty" class="input-text qty text" size="4" min="1">
        </div>
        <button type="submit" class="single_add_to_cart_button button alt">Add to basket</button>
        <input type="hidden" name="add-to-cart" value="8044">
        <input type="hidden" name="product_id" value="8044">
        <input type="hidden" name="variation_id" class="variation_id" value="8045">
      </div>
    </div>

    <div class="blockUI blockOverlay angelleyeOverlay" style="display:none;z-index: 1000; border: none; margin: 0px; padding: 0px; width: 100%; height: 100%; top: 0px; left: 0px; opacity: 0.6; cursor: default; position: absolute; background: url(http://www.sourcewebsite.com/wp-content/plugins/woocommerce/assets/images/select2-spinner.gif) 50% 50% / 16px 16px no-repeat rgb(255, 255, 255);"></div>
  </div>

</form>

I am trying to extract the price "13.50" from the below div.

<div class="single_variation"><span class="price"><span class="amount selectorgadget_selected">$13.50</span></span>
</div>

My code is below:

    private class ParseFoodPriceURL extends AsyncTask<String, Void, String> {

    @Override
    protected String doInBackground(String... strings) {
        StringBuffer buffer = new StringBuffer();
        try {
            Document doc = Jsoup.connect(strings[0]).get();
            Elements foodPrice = doc.select("div.single_variation_wrap > div.single_variation");
            String priceTextSelection = foodPrice.text();
            buffer.append("Price: $" + priceTextSelection);

        }
        catch (Throwable t) {
            t.printStackTrace();
        }
        return buffer.toString();
    }

Upvotes: 4

Views: 349

Answers (1)

luksch
luksch

Reputation: 11712

JSoup is not a browser, so it will not interpret and execute JavaScript. If the content of a website is generated dynamically you can't use JSoup directly. Two options come to my mind:

  1. Identify the AJAX calls directly and get the information via these calls. Often the response is not HTML but JSON. So you may need other parsing libraries. This option is fast, but you need to investigate and understand how the webpage works.

  2. Use selenium webdriver with a real browser engine (phantomjs for example). This will load the website like a real browser but you can access its contents similar to JSoup. This is relatively easy to program, but slow and uses a lot of resources. If you run within android this may be too much. Anyway for Android the right tool for this seems to be Selenoid.

Upvotes: 1

Related Questions