Reputation: 223
I am trying to get the date value into a dataframe using beautifulsoup.
I want to get "17 May 2021"
How can I do it?
My existing code:
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup as bs
browser = webdriver.Chrome()
urls = {
"https://www.oddsportal.com/matches/soccer/"
}
class GameData:
def __init__(self):
self.date = []
def parse_data(url):
browser.get(url)
df = pd.read_html(browser.page_source, header=0)[0]
html = browser.page_source
soup = bs(html, "lxml")
cont = soup.find('div', {'id': 'wrap'})
content = cont.find('div', {'id': 'col-content'})
content = content.find('table', {'class': 'table-main'}, {'id': 'table-matches'})
main = content.find('th', {'class': 'first2 tl'})
if main is None:
return None
count = main.findAll('a')
country = count[0].text
league = count[1].text
How can I get the date value in df
?
Upvotes: 0
Views: 127
Reputation: 84465
I am uncertain about your use of class here, whether all functions are part of it, whether df is actually the shape & contents you intended, and you certainly need to return df. That aside, to answer your exact question, inside the second function, put this at the end:
date = datetime.datetime.strptime(soup.select_one('#col-content h1').text.split(', ')[-1],'%d %b %Y') #.date()
df['date'] = date
And remember to import datetime
.
Upvotes: 1