JasonBeedle
JasonBeedle

Reputation: 469

Scraped website data is not being written to a CSV

I am trying to scrape a website to get the info and output it to a CSV file. For the data I am trying to extract, there is an output to the terminal but I need that to be in a CSV file.

I have tried several different methods but cannot find a solution. The CSV file is created but it's just empty. There is probably something really simple.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import csv
import time
from bs4 import BeautifulSoup

DRIVER_PATH = '/Users/jasonbeedle/Desktop/snaviescraper/chromedriver'

options = Options()
options.page_load_strategy = 'normal'

# Navigate to url
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get("http://best4sport.tv/2hd/2020-12-10/")
options.add_argument("--window-size=1920x1080")
results = driver.find_element_by_class_name('program1_content_container')
soup = BeautifulSoup(results.text, 'html.parser')

# results = driver.find_element_by_class_name('program1_content_container')
p_data1 = soup.find_all("div", {"class_name": "program1_content_container"})
p_data2 = soup.find_all("div", {"class_name": "program_time"})
p_data3 = soup.find_all("div", {"class_name": "sport"})
p_data4 = soup.find_all("div", {"class": "program_text"})

print("Here is your data, I am off ot sleep now see ya ")
print(results.text)
# Create csv
programme_list = []
# Programme List
for item in p_data1:
    try:
        name = item.contents[1].find_all(
            "div", {"class": "program1_content_container"})[0].text
    except:
        name = ''

    p_data1 = [time]
    programme_list.append(p_data1)

# Programme Time
for item in p_data2:
    try:
        time = item.contents[1].find_all(
            "div", {"class": "program_time"})[0].text
    except:
        time = ''

    p_data2 = [time]
    programme_list.append(p_data2)

# Which sport
for item in p_data3:
    try:
        time = item.contents[1].find_all(
            "div", {"class": "sport"})[0].text
    except:
        time = ''

    p_data3 = [time]
    programme_list.append(p_data3)

with open('sport.csv', 'w') as file:
    writer = csv.writer(file)
    for row in programme_list:
        writer.writerow(row)

I have just tried to add an object called data_output Then I tried to print the data_output

data_output = [p_data1, p_data2, p_data3, p_data4]
...
print(data_output)

The output in the terminal is

Upvotes: 3

Views: 159

Answers (3)

Bsonjin
Bsonjin

Reputation: 468

Instead of writing binary, can you try changing wb to w?
Change

with open('sport.csv', 'wb') as file:

to

with open('sport.csv', 'w') as file:

EDITED:

Sorry for being a bit late. Here is the code modified based on your original code FYI.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import csv
import time
from bs4 import BeautifulSoup

from selenium.webdriver.chrome.options import Options

DRIVER_PATH = '/Users/jasonbeedle/Desktop/snaviescraper/chromedriver'

options = Options()
options.page_load_strategy = 'normal'

# Navigate to url
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get("http://best4sport.tv/2hd/2020-12-10/")
options.add_argument("--window-size=1920x1080")
results = driver.find_element_by_class_name('program1_content_container')
page = driver.page_source
soup = BeautifulSoup(page, 'html.parser')

# results = driver.find_element_by_class_name('program1_content_container')
p_data1 = soup.find_all("p", {"class": "program_info"})
p_data2 = soup.find_all("p", {"class": "program_time"})
p_data3 = soup.find_all("p", {"class": "sport"})
p_data4 = soup.find_all("p", {"class": "program_text"})

# Create csv
programme_list = []
# Programme List
for i in range(len(p_data1)):
    programme_list.append([p_data1[i].text.strip(), p_data2[i].text.strip(), p_data3[i].text.strip(), p_data4[i].text.strip()])
    
with open('sport.csv', 'w', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(["program_info", "program_time", "sport", "program_text"])
    for row in programme_list:
        writer.writerow(row)

Excel Screenshot here enter image description here

Upvotes: 1

KunduK
KunduK

Reputation: 33384

Load data into pandas dataframe and export into csv.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup

DRIVER_PATH = '/Users/jasonbeedle/Desktop/snaviescraper/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get("http://best4sport.tv/2hd/2020-12-10/")
results =WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".program1_content_container")))
soup = BeautifulSoup(results.get_attribute("outerHTML"), 'html.parser')
program_time=[]
sport=[]
program_text=[]
program_info=[]
for item in soup.select(".program_details "):
    if item.find_next(class_='program_time'):
        program_time.append(item.find_next(class_='program_time').text.strip())
    else:
        program_time.append("Nan")
    if item.find_next(class_='sport'):
        sport.append(item.find_next(class_='sport').text.strip())
    else:
        sport.append("Nan")
    if item.find_next(class_='program_text'):
        program_text.append(item.find_next(class_='program_text').text.strip())
    else:
        program_text.append("Nan")
    if item.find_next(class_='program_info'):
        program_info.append(item.find_next(class_='program_info').text.strip())
    else:
        program_info.append("Nan")

df=pd.DataFrame({"program_time":program_time,"sport":sport,"program_text":program_text,"program_info":program_info})
print(df)
df.to_csv("sport.csv")

csv snapshot after creation

enter image description here

If you don't have pandas then you need to install it.

pip install pandas

Upvotes: 2

HedgeHog
HedgeHog

Reputation: 25073

As Blue Fishy said you can try to change to w mode only, but you may run in an encoding error.

Solution that works on your data

import csv 

programme_list = ['19:55','MOTORU SPORTS','Motoru sporta "5 minūte"','Iknedēļas Alda Putniņa veidots apskats par motoru sportu','20:00','BASKETBOLS','...']

with open('sport.csv', 'w', encoding='utf-8') as file:
    writer = csv.writer(file, delimiter=',', lineterminator='\n')
    for row in programme_list:
        print(row)
        writer.writerow([row])

Output

19:55
MOTORU SPORTS
"Motoru sporta ""5 minūte"""
Iknedēļas Alda Putniņa veidots apskats par motoru sportu
20:00
BASKETBOLS
...

Upvotes: 1

Related Questions