Myron Walters
Myron Walters

Reputation: 223

Parse data with BeautifulSoup4

import requests
from bs4 import BeautifulSoup

request = requests.get("http://www.lolesports.com/en_US/worlds/world_championship_2016/standings/default")
content = request.content
soup = BeautifulSoup(content, "html.parser")
team_name = soup.findAll('text', {'class': 'team-name'})

print(team_name)

I'm trying to parse data from url:"http://www.lolesports.com/en_US/worlds/world_championship_2016/standings/default". Under <text class="team-name">SK Telecom T1</text> are the individual team names. What I am trying to do is parse that data (SK Telecom T1) and print it to the screen but instead I get [] a empty list. What am I doing wrong?

Upvotes: 2

Views: 177

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

You don't need selenium, all the dynamic content can be retrieved in json format with a simple get request to http://api.lolesports.com/api/v1/leagues :

import requests

data = requests.get("http://api.lolesports.com/api/v1/leagues?slug=worlds").json()

Which gives you a whole lot of data, what you want seems to be all under data["teams"]. A snippet of which is:

[{'id': 2, 'slug': 'bangkok-titans', 'name': 'Bangkok Titans', 'teamPhotoUrl': 'http://na.lolesports.com/sites/default/files/BKT_GPL.TMPROFILE_0.png', 'logoUrl': 'http://assets.lolesports.com/team/bangkok-titans-597g0x1v.png', 'acronym': 'BKT', 'homeLeague': 'urn:rg:lolesports:global:league:league:12', 'altLogoUrl': None, 'createdAt': '2014-07-17T18:34:47.000Z', 'updatedAt': '2015-09-29T16:09:36.000Z', 'bios': {'en_US': 'The Bangkok Titans are the undisputed champions of Thailand’s League of Legends esports scene. They achieved six consecutive 1st place finishes in the Thailand Pro League from 2014 to 2015. However, they aren’t content with just domestic domination.

Each team is listed in the list if dicts:

In [1]: import requests


In [2]: data = requests.get("http://api.lolesports.com/api/v1/leagues?slug=worlds").json()


In [3]: for d in data["teams"]:
   ...:         print(d["name"])
   ...:     
Bangkok Titans
ahq e-Sports Club
SK Telecom T1
TSM
Fnatic
Cloud9 
Counter Logic Gaming
H2K
Edward Gaming
INTZ e-Sports
paiN Gaming
Origen
LGD Gaming
Invictus Gaming
Royal Never Give Up
Flash Wolves
Splyce
Samsung Galaxy
KT Rolster
ROX Tigers
G2 Esports
I May
Albus NoX Luna

Upvotes: 2

BernardoGO
BernardoGO

Reputation: 1856

The website depends on javascript to load. Requests does not interpret JS and thus it will not be able to parse the data.

For websites like this you will be better with Selenium. It uses Firefox(or another driver) as interpreter for the whole website including JS.

Upvotes: 2

Related Questions