Reputation: 25

Scraping Data using BS4 gives me unexpected results

I don't want to annoy you with my very basic questions, but I am stuck and I hope you can help me. I've done tutorials and watched many videos but i can't figure out what i am doing wrong. I want to scrape data from this table: https://www.youpriboo.com/vorher_102_main_nat.php?action=show&liga=2.BL

This is my code:

import requests
from bs4 import BeautifulSoup

base_URL = 'https://www.youpriboo.com/vorher_102_main_nat.php?action=show&liga='
liga = '2.BL'
URL = base_URL + liga

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36:'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

for name in soup.find_all("td", class_="hac"):
    name1 = name.parent.find_all('td')[1] # team1 
    name2 = name.parent.find_all('td')[2] # team2
    wahr1 = name.parent.find_all('td')[6] # wahr1
    print(name1.get_text() +' '+ name2.get_text()+' '+ wahr1.get_text())

The Problem is that it gives me the data 3 times and there are 3 numbers listed between the games.

The expected result would look like this:

Armina Bielefeld VfB Stuttgart 34,43
SV Wehen Wiesbaden VfL Osnabrück 34,51
(and so on)

Thanks for your time and work!

I have posted this also here: https://www.reddit.com/r/Python/comments/d9km7y/scraping_data_using_bs4_gives_me_unexpected/

Upvotes: 2

Answers (3)

SIM

Reputation: 22440

You can scrape and write the results in a csv file in few different ways. The one I prefer to go with is pandas. Try using :has() in the first place to filter out the unwanted content. That said the following should work:

import requests
import pandas as pd
from bs4 import BeautifulSoup

base_URL = 'https://www.youpriboo.com/vorher_102_main_nat.php?action=show&liga='
liga = '2.BL'

URL = f"{base_URL}{liga}"

page = requests.get(URL, headers={"User-Agent": 'Mozilla/5.0'})
soup = BeautifulSoup(page.content, 'html.parser')

df = pd.DataFrame(columns=['Name_One','Name_Ano','Wahr'])
for tr in soup.select('.prognose_tab_1 tr:has(.greycell)'):
    name1 = tr.select('.hac')[1].get_text()
    name2 = tr.select('.hac')[2].get_text()
    wahr1 = tr.select('.greycell')[0].get_text()
    df = df.append({'Name_One':name1, 'Name_Ano':name2, 'Wahr':wahr1}, ignore_index=True)

    print(f"{name1} {name2} {wahr1}")

df.to_csv("youpriboo.csv", encoding='utf-8', index=False)

Upvotes: 1

KunduK

Reputation: 33384

Try the below code.This will gives you your expected output.

import requests
from bs4 import BeautifulSoup

base_URL = 'https://www.youpriboo.com/vorher_102_main_nat.php?action=show&liga='
liga = '2.BL'
URL = base_URL + liga

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36:'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')
table=soup.select_one(".prognose_tab_1")
for tr in table.select('tr'):
  if tr.select('.hac') and tr.select('.greycell'):
     name1=tr.select('.hac')[1]
     name2 = tr.select('.hac')[2]
     wahr1 = tr.select('.greycell')[0]

     print(name1.get_text() +' '+ name2.get_text()+' '+ wahr1.get_text())

Output

Arminia Bielefeld VfB Stuttgart 34,43
SV Wehen Wiesbaden VfL Osnabrück 34,51
Jahn Regensburg Hamburger SV 24,18
Karlsruher SC 1. FC Heidenheim 37,70
VfL Bochum SV Darmstadt 98 55,22
Erzgebirge Aue Dynamo Dresden 37,70
FC St. Pauli SV Sandhausen 43,90
SpVgg Greuther Fürth Holstein Kiel 46,23
Hannover 96 1. FC Nürnberg 46,23

Upvotes: 0

Sureshmani Kalirajan

Reputation: 1938

The filtering is not correct. Try this approach,

table = soup.find_all("tr")
#print(table)
for row in table:
    data = row.find_all("td", class_="hac")
    if(len(data)>0):
        print(data[1].get_text(),data[2].get_text())

    data = row.find_all("td", class_="greycell")
    if(len(data)>0):
        print(data[0].get_text())

Upvotes: 0

Scraping Data using BS4 gives me unexpected results

Answers (3)

Related Questions