Alex F
Alex F

Reputation: 2274

Python BeautifulSoup findAll not returning all the elements?

I am trying to pull data from this url https://99airdrops.com/page/1/.

The code I have written is below.

import requests
from bs4 import BeautifulSoup

url_str = 'https://99airdrops.com/page/1/'

page = requests.get(url_str, headers={'User-Agent': 'Mozilla Firefox'})

# soup = BeautifulSoup(page.text, 'lxml')
soup = BeautifulSoup(page.text, 'html.parser')

# print(soup.prettify())

print(len(soup.findAll('div')))

print(soup.find('div', class_='title'))

My issue is the line print(len(soup.findAll('div'))) is only returning 23, and the line print(soup.find('div', class_='title')) prints None. The find command isn't finding the div element with class_='title' even though there are multiple instances, and the div element is nested deeply in the html page but this has never caused me issues before.

I've tried using the lxml and html.parser, but neither is returning all the div elements. I also tried writing the html to a file, reading it in, and then running BeautifulSoup with it but I got the same results. Could someone tell me what the issue is here?

I also tried the suggestions here Beautiful Soup - `findAll` not capturing all tags in SVG (`ElementTree` does) to update my lxml package but I run into the same issue still.

I also tried the solutions here BeautifulSoup doesn't find correctly parsed elements with no luck.

Upvotes: 0

Views: 1820

Answers (1)

G_M
G_M

Reputation: 3382

It seems like you can get all of the data you are looking for with a single request.

>>> import requests
>>> r = requests.get('https://cdn.99airdrops.com/static/airdrops.json')
>>> data = r.json()
>>> len(data)
133

For example:

>>> import json; print(json.dumps(data.popitem(), indent=2))
[
  "pointium",
  {
    "unique": "pointium",
    "name": "Pointium",
    "currency": "PNT",
    "description": "Global Decentralized Platform for Point Management & Loyalty Program",
    "instructions": "<ol><li>Join Telegram <a href=\"https://t.me/pointium\" target=\"_blank\">@Pointium</a> and click \"Join Airdrop\" (+500 PNT) </li><li>Enter your e-mail (+200 PNT) </li><li><a href=\"https://twitter.com/POINTIUM_ICO\" target=\"_blank\">Follow Twitter</a> and submit your username (+500 PNT) </li><li>Confirm your details</li></ol>",
    "rating": "7.30",
    "addDate": "2018-04-20 06:23:03",
    "expirationDate": "2018-05-07",
    "startDate": "2018-04-07",
    "image": "https://cdn.99airdrops.com/static/pointium.jpeg",
    "joinLink": "https://www.pointium.org/airdrop",
    "sponsored": "0",
    "status": "0",
    "startDateFormatted": "7th of April",
    "expirationDateFormatted": "7th of May",
    "attributes": {
      "bitcointalk": "0",
      "category": "airdrop",
      "email": "1",
      "facebook": "0",
      "kyc": "0",
      "news": "https://twitter.com/POINTIUM_ICO",
      "opinion": "O parere personala este ca merge acest sistem foarte bine. Doar ca mai avem de lucrat la el sa fie bomba!",
      "other": "0",
      "phone": "0",
      "ratingConcept": "7",
      "ratingTeam": "5.5",
      "ratingWebsite": "7",
      "ratingWhitepaper": "8",
      "reddit": "0",
      "telegram": "1",
      "tokenGiven": "1200",
      "tokenPrice": "0.007",
      "tokenSupply": "1,600,000,000",
      "tokenType": "ERC20",
      "twitter": "1",
      "website": "www.pointium.org"
    }
  }
]

Upvotes: 2

Related Questions