user17659672
user17659672

Reputation:

Python regex: re.findall method throwing list index out of range error

I'm learning web scraping using python regular expression and practicing the following script source but when I run, it's throwing IndexError: list index out of range

import re
import json
import requests

url = 'https://www.att.com/buy/phones/'
html_text = requests.get(url).text

data = json.loads(re.findall(r'__NEXT_DATA__ = (.*?});', html_text)[0])
print(json.dumps(data['props']['pageProps']['deviceList'], indent=4))

Upvotes: 0

Views: 588

Answers (1)

Slybot
Slybot

Reputation: 598

The problem you are facing is a direct result of web dynamics. Websites are not static where a solution from 2019 possibly not working. Instead of using custom Regex to find the JSON, I would suggest to use Beautiful Soup (bs4) for more robust script.

The following code will give you want you wanted;

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.att.com/buy/phones/'
html_text = requests.get(url).text
soup = BeautifulSoup(html_text)
data = json.loads(soup.find('script', id='__NEXT_DATA__').text)
print(json.dumps(data['props']['initialReduxState']['solr']['deviceList'], indent=4))  

Explanation of the code

The request libraries gets the raw HTML text from the given URL and we used bs4 to parse it. The default is lxml parser. Then, we have use the find function to search for script with the id named 'NEXT_DATA' where we get the text inside of the script which is a JSON. Finally, we have loaded with json library and located the new position of 'deviceList'. For more documentation of bs4, please see https://www.crummy.com/software/BeautifulSoup/bs4/doc

First output from the long JSON

{
        "firstNet": "notApplicable",
        "productFamily": "Phn13",
        "comingSoon": false,
        "skuId": "sku2360531",
        "brand": "Apple",
        "displayContentItems": [],
        "deviceGroup": "network",
        "starRatings": 4.5962,
        "numOfStarReviews": 2959,
        "mobileImageUrl": [
            "/idpassets/global/devices/phones/apple/apple-iphone-13/defaultimage/pink-hero-zoom.png?imwidth=219"
        ],
        "largeImageURL": "//www.att.com/catalog/en/skus/images/apple-iphone%2013-pink-450x350.png",
        "model": "iPhone 13",
        "productName": "Apple iPhone 13",
        "billCode": "6164D",
        "name": "jared",
        "PDPPageURL": [
            "/buy/phones/apple-iphone-13-128gb-pink.html"
        ],
        "prepaid": "",
        "productURL": "//www.att.com/cellphones/iphone/apple-iphone-13.html#sku=sku2360531",
        "condition": "New",
        "productId": "prod10340592",
        "htmlColor": "#FADDD7",
        "isPrepaid": false,
        "isRefurbished": false,
        "isPreOwned": false,
        "isPrePreOrderable": false,
        "type": "Device",
        "color": "Pink",
        "FinalPriceIRU": 22.23,
        "FinalPriceCRU": 22.23,
        "FinalPlanType": "monthly",
        "FinalPrice": 22.23,
        "FinalnextUpCharge": [
            0
        ],
        "FinalIRUnextUpCharge": [
            0
        ],
        "FinalCRUnextUpCharge": [
            0
        ],
        "FinalCommitmentTerm": "NE36MNUP",
        "FinalCommitmentTermCRU": "NE36MNUP",
        "FinalCommitmentTermIRU": "NE36MNUP",
        "FinalBasePriceCRU": 22.23,
        "FinalBasePriceIRU": 22.23,
        "FinalPlanTypeCRU": "monthly",
        "FinalPlanTypeIRU": "monthly",
        "FinalBasePrice": 22.23,
        "FinalTermLength": 36,
        "FinalTermLengthIRU": 36,
        "FinalTermLengthCRU": 36,
        "consumerOfferDescription": "$0 w/Trade",
        "cruOfferDescription": "$0 w/Trade",
        "iruOfferDescription": "$0 w/Trade",
        "consumerOfferDescriptionAL": "$0 w/Trade",
        "consumerOfferDescriptionUP": "$0 w/Trade",
        "iruOfferDescriptionAL": "$0 w/Trade",
        "iruOfferDescriptionUP": "$0 w/Trade",
        "cruOfferDescriptionAL": "$0 w/Trade",
        "cruOfferDescriptionUP": "$0 w/Trade",
        "allProductIds": [
            "prod10340592",
            "prod10340591",
            "prod10340593"
        ],
        "allSkuIds": [
            "sku2360531",
            "sku2360535",
            "sku2360534",
            "sku2360527",
            "sku2360528",
            "sku2360530",
            "sku2360529",
            "sku2360537",
            "sku2360526",
            "sku2360536",
            "sku2360533",
            "sku10940263",
            "sku10940264",
            "sku10940268",
            "sku10940269"
        ],
        "allBillCodes": [
            "6164D",
            "6166D",
            "6162D",
            "6165D",
            "6163D",
            "6169D",
            "6171D",
            "6167D",
            "6170D",
            "6168D",
            "6174D",
            "6176D",
            "6172D",
            "6175D",
            "6173D"
        ],
        "tradeInLegalModalPath": "/idpassets/fragment/legal/prod/legalcontent/wireless/offers/19900012/19900012_offertray_lm.cmsfeed.js",
        "tradeInLegalText": "Req\u2019s elig. unlimited (speed restr\u2019s apply) & trade-in. Price after 36 mo. credits. Other terms apply. ",
        "tradeInShortLegalLinkLabel": "See offer details",
        "tradeInPromoReference": "19900012",
        "tradeInMonthlyPromoPrice": "0",
        "tradeInLegalModalPathCRU": "/idpassets/fragment/legal/prod/legalcontent/wireless/offers/19900012/19900012_offertray_lm.cmsfeed.js",
        "tradeInLegalTextCRU": "Req\u2019s elig. unlimited (speed restr\u2019s apply) & trade-in. Price after 36 mo. credits. Other terms apply. ",
        "tradeInShortLegalLinkLabelCRU": "See offer details",
        "tradeInPromoReferenceCRU": "19900012",
        "tradeInMonthlyPromoPriceCRU": "0"
    }

Upvotes: 1

Related Questions