PyNoob
PyNoob

Reputation: 223

Beautifulsoup: How do I get the text from a webpage into dataframe?

I am trying to get the date value into a dataframe using beautifulsoup.

I want to get "17 May 2021"

How can I do it?

My existing code:

import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup as bs

browser = webdriver.Chrome()

urls = {
    "https://www.oddsportal.com/matches/soccer/"
}
class GameData:

    def __init__(self):
        self.date = []

def parse_data(url):
    browser.get(url)
    df = pd.read_html(browser.page_source, header=0)[0]
    html = browser.page_source
    soup = bs(html, "lxml")
    cont = soup.find('div', {'id': 'wrap'})
    content = cont.find('div', {'id': 'col-content'})
    content = content.find('table', {'class': 'table-main'}, {'id': 'table-matches'})
    main = content.find('th', {'class': 'first2 tl'})
    if main is None:
        return None
    count = main.findAll('a')
    country = count[0].text
    league = count[1].text

Inspect element for text value

How can I get the date value in df?

Upvotes: 0

Views: 127

Answers (1)

QHarr
QHarr

Reputation: 84465

I am uncertain about your use of class here, whether all functions are part of it, whether df is actually the shape & contents you intended, and you certainly need to return df. That aside, to answer your exact question, inside the second function, put this at the end:

date = datetime.datetime.strptime(soup.select_one('#col-content h1').text.split(', ')[-1],'%d %b %Y') #.date()
df['date'] = date

And remember to import datetime.

Upvotes: 1

Related Questions