Reputation: 35
My following code (almost) manages to scrape each players data into rows, with column values separated by commas. However, it seems that the player names have underlying children which are also being displayed in separate rows. I simply want the text of the name, not the links. Also, some records are repeated in my output. Any help would be greatly appreciated! I am using BS4 and Python 3.5. Here is my code:
import urllib
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
page = urllib.request.urlopen(url)
soupdata = BeautifulSoup(page, "html.parser")
return soupdata
currentdata = ""
soup = make_soup("http://www.foxsports.com/soccer/stats? competition=1&season=20160&category=STANDARD&pos=0&team=0&isOpp=0&sort=3&sortOrder=0&page=0")
for record in soup.findAll('tr'):
playerdata = ""
for data in record.findAll('td'):
playerdata = playerdata + "," + data.text
currentdata = currentdata + "\n" + playerdata
print(currentdata)
Upvotes: 0
Views: 107
Reputation: 12158
import urllib
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
page = urllib.request.urlopen(url)
soupdata = BeautifulSoup(page, "html.parser")
return soupdata
currentdata = ""
soup = make_soup("http://www.foxsports.com/soccer/stats? competition=1&season=20160&category=STANDARD&pos=0&team=0&isOpp=0&sort=3&sortOrder=0&page=0")
for record in soup.findAll('tr', class_=False):
row = [data.get_text(',', strip=True) for data in record.findAll('td')]
print(' '.join(row))
out:
1,Sánchez, Alexis,Sánchez, A.,ARS 21 20 1786 14 7 30 72 3 0
1,Costa, Diego,Costa, D.,CHE 19 19 1681 14 5 26 57 5 0
1,Ibrahimovic, Zlatan,Ibrahimovic, Z.,MUN 20 20 1800 14 3 36 89 5 0
4,Kane, Harry,Kane, H.,TOT 16 16 1360 13 2 27 53 0 0
5,Lukaku, Romelu,Lukaku, R.,EVE 20 19 1737 12 4 28 55 3 0
5,Defoe, Jermain,Defoe, J.,SUN 21 21 1882 12 2 18 57 1 0
tr
you do not want, use class_=False
, this will select tr
which does not have class
attribute.get_text()
can define an separator.Upvotes: 1