sak18
sak18

Reputation: 121

Beautiful soup does'nt return all tags

I am new to web-scraping and am using python3 along with beautifulsoup4 to scrape information about some phones from this site att.com.

Here is my code to extract the outer div of each phone from the html(There are 49 phones in total here).

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.att.com/buy/phones/').text
#soup = BeautifulSoup(source,'lxml')
#soup = BeautifulSoup(source,'html5lib')
soup  = BeautifulSoup(source,'html.parser')
phone_div=soup.findAll('div',class_='_1hOzu')
#phone_div=soup.findAll('div',class_='_2Ldwa')
#phone_div=soup.find('div',class_='_3kwdR')
#phone_div=soup.findAll('div',class_='_1BGB4')
print(phone_div[1].prettify())
print(phone_div[5].prettify())

Here is the output for first phone div(similar for the first four phones) containing all information about phone name,price etc:

<div class="_1hOzu">
 <div class="_14rcf _1NPjc false false" data-index="1" tabindex="0">
  <a class="_3-Yg9 _13w_Y" data-qa="DeviceTile-PDPlink-iPhone XS Max" href="/buy/phones/apple-iphone-xs-max-64gb-silver.html" tabindex="-1">
   <div class="_27UM0 false">
    <div class="_3C82I">
     <div class="_bOwfD">
      <span class="_2VSUp">
       Buy one, give one.
      </span>
     </div>
    </div>
   </div>
   <div class="_2pI5U">
    <div class="_3AUSX">
     <i class="_3cKi3" style="height:50px;width:50px">
     </i>
    </div>
    <div class="_VzvqU">
    </div>
   </div>
   <div>
    <div class="_1bjup">
     <div>
      <div class="_2Ldwa">
       APPLE
      </div>
      <div>
       <div class="_1BGB4">
        iPhone XS Max
       </div>
       <div class="_izQNb">
        placeholder
       </div>
      </div>
     </div>
    </div>
    <div class="_1NK_S">
     <div class="_1O0IX">
      <div class="_3AUSX">
       <i class="_3cKi3" style="height:50px;width:50px">
       </i>
      </div>
     </div>
    </div>
    <div>
     <div class="_3JaQ9 ">
      <div class="_1dPLs _3yvoJ _38PTM">
       <label class="_1ih28">
        <i class="_9V5dD _10JvD">
        </i>
        <span class="_1C-NR">
         Star Ratings
        </span>
        <input class="_ZI8n9" name="Customer Reviews" readonly="" type="radio" value="1"/>
       </label>
       <label class="_1ih28">
        <i class="_9V5dD _10JvD">
        </i>
        <span class="_1C-NR">
         Star Ratings
        </span>
        <input class="_ZI8n9" name="Customer Reviews" readonly="" type="radio" value="2"/>
       </label>
       <label class="_1ih28">
        <i class="_9V5dD _10JvD">
        </i>
        <span class="_1C-NR">
         Star Ratings
        </span>
        <input class="_ZI8n9" name="Customer Reviews" readonly="" type="radio" value="3"/>
       </label>
       <label class="_1ih28">
        <i class="_9V5dD _10JvD">
        </i>
        <span class="_1C-NR">
         Star Ratings
        </span>
        <input class="_ZI8n9" name="Customer Reviews" readonly="" type="radio" value="4"/>
       </label>
       <label class="_1ih28">
        <i class="_18XCu _10JvD">
        </i>
        <i class="_fLbUs _9V5dD _10JvD" style="width:58.95%">
        </i>
        <span class="_1C-NR">
         Star Ratings
        </span>
        <input class="_ZI8n9" name="Customer Reviews" readonly="" type="radio" value="5"/>
       </label>
      </div>
      <span>
       4.6
       <span class="_VCKql">
        |
       </span>
       531
      </span>
     </div>
     <p class="_2bs9E ">
      $36.67
      <span class="_31cDG">
       /mo.
      </span>
     </p>
    </div>
    <div class="_1YUjH">
     <div>
     </div>
     <div class="_3gbuG">
      <div>
       Req.’s 0% APR 30-mo. installment agmt, qual. credit and service.
      </div>
      <div class="_3gbuG">
       <button class="_1oGNe" data-index="1" data-qa="DeviceTilePLP-SeePriceDetails" tabindex="0">
        See
        <!-- -->
        price details.
       </button>
      </div>
     </div>
    </div>
    <div class="_3_rcU">
    </div>
   </div>
   <div class="_37Icd ">
   </div>
  </a>
 </div>
</div>

Output for remaining phone divs:

<div class="_1hOzu">
 <div class="_14rcf _1NPjc false false" data-index="5" tabindex="0">
  <div class="_3AUSX">
   <i class="_3cKi3" style="height:50px;width:50px">
   </i>
  </div>
 </div>
</div>

Not getting all nested inner tags for the remaining divs and hence I am not able to extract anything from it.Already read some SO answers about missing inner tags and tried using different parsers based on those answers but did not help. Any idea where I might be wrong??

Upvotes: 0

Views: 431

Answers (2)

Arun Augustine
Arun Augustine

Reputation: 1766

Its because the requests are dynamic requests, the request method does not return all the tags that you see in the inspect element. (Check out the page source, this is what you get as response)

For getting those data, instead of simple request, try selenium request. It will return the dynamic response like inspect element.

Example:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.att.com/buy/phones/')
content = BeautifulSoup(driver.page_source, 'html.parser')
phone_div=content.findAll('div',class_='_1hOzu')
print(phone_div[1].prettify())
print(phone_div[5].prettify())

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195408

The data is loaded through Javascript, but the content is inside the page. With some regexp we can extract the Json content (full content is stored in variable data):

import re
import json
import requests

url = 'https://www.att.com/buy/phones/'
html_text = requests.get(url).text

data = json.loads(re.findall(r'__NEXT_DATA__ = (.*?});', html_text)[0])
print(json.dumps(data['props']['pageProps']['deviceList'], indent=4))

Prints:

[
    {
        "color": "Black",
        "manufacturerShortName": "apple",
        "paymentType": "postpaid",
        "deviceSubType": "pda",
        "iotDevice": false,
        "starRatings": 4.6092,
        "newArrival": false,
        "imageUrl": "https://www.att.com/catalog/en/skus/images/apple-iphone%20xr-black-100x160.jpg",
        "model": "iPhone XR",
        "brand": "Apple",
        "skuId": "sku9240254",
        "displayContentItems": [
            {
                "displayType": "ribbon",
                "contentSource": "cms",
                "marketingPriority": 1,
                "flowTypes": [
                    "NEW",
                    "UP",
                    "AL"
                ],
                "enable": true,
                "description": "Buy one, give one.",
                "customerTypes": [
                    "CRU"
                ],
                "contentType": "image"
            },
            {
                "displayType": "ribbon",
                "contentSource": "cms",
                "marketingPriority": 1,
                "flowTypes": [
                    "NEW",
                    "UP",
                    "AL"
                ],
                "enable": true,
                "description": "Buy one, give one.",
                "customerTypes": [
                    "CONSUMER",
                    "IRU"
                ],
                "contentType": "image"
            },

...and so on.

Upvotes: 1

Related Questions