Robsmith
Robsmith

Reputation: 473

Printing Text Scraped Using BeautifulSoup to Pandas Dataframe without Tags

I have been working on the code below and getting myself tied up in knots. What I am trying to do is build a simple dataframe using text scraped using BeautifulSoup.

I have scraped the applicable text from the <h5> and <p> tags but using find_all means that when I build the dataframe and write to csv the tags are included. To deal with this I have added the print(p.text, end=" ") statements but now nothing is being written to the csv.

Can anyone see what I am doing wrong?

import pandas as pd
import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}

course = []
runner = []

page = requests.get('https://www.attheraces.com/tips/atr-tipsters/hugh-taylor', headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
tips = soup.find('div', class_='sticky')
for h5 in tips.find_all("h5"):
    course_name = print(h5.text, end=" ")
    course.append(course_name)

for p in tips.find_all("p"):
    runner_name = print(p.text, end=" ")
    runner.append(runner_name)

todays_tips = pd.DataFrame(
    {'Course': course,
     'Selection': runner,
     })

print(todays_tips)

todays_tips.to_csv(r'C:\Users\*****\Today.csv')

Upvotes: 0

Views: 103

Answers (1)

baduker
baduker

Reputation: 20042

Don't use the assignment for print and consider using a list comprehension. Applying this should get you the dataframe you want.

For example:

import pandas as pd
import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}

page = requests.get('https://www.attheraces.com/tips/atr-tipsters/hugh-taylor', headers=headers)
tips = BeautifulSoup(page.content, 'html.parser').find('div', class_='sticky')

course = [h5.getText() for h5 in tips.find_all("h5")]
runner = [p.getText() for p in tips.find_all("p")]

todays_tips = pd.DataFrame({'Course': course, 'Selection': runner})
print(todays_tips)
todays_tips.to_csv("your_data.csv", index=False)

Output:

          Course                                  Selection
0   1.00 HAYDOCK  1pt win RAINBOW JET (12-1 & 11-1 general)
1  2.50 GOODWOOD            1pt win MARSABIT (11-2 general)

And a .csv file:

enter image description here

Upvotes: 1

Related Questions