How can I scrape this data? [BeautifulSoup4 with Python]

Question

I've found a way to scrape other websites, but for this code it requires a special "browser" to access the html variables, thing is after I do this, the program doesn't crash, but no longer works.

Variables I want: rank, name, code, points (https://i.sstatic.net/POMEz.jpg)

This is the code that I made but it doesn't work on this website: [Runs but nothing reads/saves]

from urllib.request import urlopen as uReq
from urllib.request import Request
from bs4 import BeautifulSoup as soup

myUrl = "https://mee6.xyz/levels/159962941502783488"

req = Request(
    myUrl, 
    data=None, 
    headers={
        'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'
    }
)

uClient = uReq(req)
pageHtml = uClient.read()
uClient.close()

page_soup = soup(pageHtml, "html.parser")

containers = page_soup.findAll("div",{"class":"Player"})
print(containers)

The code I used that did work was from a youtube tutorial, when changing the url it wont work with the mee6 leaderboard because it refuses the browser: [Crashes for the mee6 url]

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import csv

my_url = "https://www.newegg.ca/Product/ProductList.aspx?Submit=ENE&N=100007708%20601210955%20601203901%20601294835%20601295933%20601194948&IsNodeId=1&bop=And&Order=BESTSELLING&PageSize=96"

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div",{"class":"item-container"})
filename = "GPU Prices.csv"
header = ['Price', 'Product Brand', 'Product Name', 'Shipping Cost']

with open(filename, 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)

    for container in containers:
        price_container = container.findAll("li", {"class":"price-current"})
        price = price_container[0].text.replace('\xa0', ' ').strip(' –
|')

        brand = container.div.div.a.img["title"]

        title_container = container.findAll("a", {"class":"item-title"})
        product_name = title_container[0].text

        shipping_container = container.findAll("li", {"class":"price-ship"})
        shipping = shipping_container[0].text.strip()

        csv_output.writerow([price, brand, product_name, shipping])

SIM · Accepted Answer

Try the below approach to fetch the data from that page. The webpage loads it's content dynamically so requests won't help you grab the response if you stick to the original url. Use dev tools to collect the json link as I did here. Give it a shot:

import requests

URL = 'https://mee6.xyz/api/plugins/levels/leaderboard/159962941502783488'

res = requests.get(URL)
for item in res.json()['players']:
    name = item['username']
    discriminator = item['discriminator']
    xp = item['xp']
    print(name,discriminator,xp)

Output are like:

Sil 5262 891462
Birdie♫ 6017 745639
Delta 5728 641571
Mr. Squishy 0001 308349
Majick 6918 251024
Samuel (xCykrix) 1101 226470
WolfGang1710 6782 222741

To write the results in a csv file you can do like:

import requests
import csv

Headers = ['Name','Discriminator','Xp']
res = requests.get('https://mee6.xyz/api/plugins/levels/leaderboard/159962941502783488')

with open('leaderboard.csv','w', newline='', encoding = "utf-8") as infile:
    writer = csv.writer(infile)
    writer.writerow(Headers)
    for item in res.json()['players']:
        name = item['username']
        discriminator = item['discriminator']
        xp = item['xp']
        print(name,discriminator,xp)
        writer.writerow([name,discriminator,xp])

How can I scrape this data? [BeautifulSoup4 with Python]

Answers (1)

Related Questions