Reputation: 473
I have been working on the code below and getting myself tied up in knots. What I am trying to do is build a simple dataframe using text scraped using BeautifulSoup.
I have scraped the applicable text from the <h5>
and <p>
tags but using find_all
means that when I build the dataframe and write to csv the tags are included. To deal with this I have added the print(p.text, end=" ")
statements but now nothing is being written to the csv.
Can anyone see what I am doing wrong?
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}
course = []
runner = []
page = requests.get('https://www.attheraces.com/tips/atr-tipsters/hugh-taylor', headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
tips = soup.find('div', class_='sticky')
for h5 in tips.find_all("h5"):
course_name = print(h5.text, end=" ")
course.append(course_name)
for p in tips.find_all("p"):
runner_name = print(p.text, end=" ")
runner.append(runner_name)
todays_tips = pd.DataFrame(
{'Course': course,
'Selection': runner,
})
print(todays_tips)
todays_tips.to_csv(r'C:\Users\*****\Today.csv')
Upvotes: 0
Views: 103
Reputation: 20042
Don't use the assignment for print
and consider using a list comprehension
. Applying this should get you the dataframe you want.
For example:
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}
page = requests.get('https://www.attheraces.com/tips/atr-tipsters/hugh-taylor', headers=headers)
tips = BeautifulSoup(page.content, 'html.parser').find('div', class_='sticky')
course = [h5.getText() for h5 in tips.find_all("h5")]
runner = [p.getText() for p in tips.find_all("p")]
todays_tips = pd.DataFrame({'Course': course, 'Selection': runner})
print(todays_tips)
todays_tips.to_csv("your_data.csv", index=False)
Output:
Course Selection
0 1.00 HAYDOCK 1pt win RAINBOW JET (12-1 & 11-1 general)
1 2.50 GOODWOOD 1pt win MARSABIT (11-2 general)
And a .csv
file:
Upvotes: 1