f e
f e

Reputation: 73

extract just date from beautifulsoup result

I am trying to scrape a date from a web-site using BeautifulSoup:

I have got it down to this:

how do I extract only the date-time from this? I only want : May 21, 2021 19:47

Upvotes: 1

Views: 376

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can use this example how to extract the date-time from the <ctag>s:

from bs4 import BeautifulSoup

html_doc = """
    <ctag class="">May 21, 2021 19:47 Source: <span>BSE</span> </ctag>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for ctag in soup.find_all("ctag"):
    dt = ctag.get_text(strip=True).rsplit(maxsplit=1)[0]
    print(dt)

Prints:

May 21, 2021 19:47

Or:

for ctag in soup.find_all("ctag"):
    dt = ctag.contents[0].rsplit(maxsplit=1)[0]
    print(dt)

Or:

for ctag in soup.find_all("ctag"):
    dt = ctag.find_next(text=True).rsplit(maxsplit=1)[0]
    print(dt)

EDIT: To get dataframe of articles, you can do:

import requests
from bs4 import BeautifulSoup
import pandas as pd


url = "https://www.moneycontrol.com/company-notices/reliance-industries/notices/RI"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = []
for ctag in soup.select("li ctag"):
    data.append(
        {
            "title": ctag.find_next("a").get_text(strip=True),
            "date": ctag.find_next(text=True).rsplit(maxsplit=1)[0],
            "desc": ctag.find_next("p", class_="MT2").get_text(strip=True),
        }
    )

df = pd.DataFrame(data)
print(df)

Prints:

                                               title                date                                               desc
0  Reliance Industries - Compliances-Reg. 39 (3) ...  May 21, 2021 19:47  Pursuant to Regulation 39(3) of the Securities...
1  Reliance Industries - Announcement under Regul...  May 19, 2021 21:20  We refer to Regulation 5 of the SEBI (Prohibit...
2  Reliance Industries - Announcement under Regul...  May 17, 2021 17:18  In continuation of our letter dated May 15, 20...
3  Reliance Industries - Announcement under Regul...  May 17, 2021 16:06  Please find attached a media release by Relian...
4  Reliance Industries - Announcement under Regul...  May 15, 2021 15:15  The Company has, on May 15, 2021, published in...
5  Reliance Industries - Compliances-Reg. 39 (3) ...  May 14, 2021 19:44  Pursuant to Regulation 39(3) of the Securities...
6  Reliance Industries - Notice For Payment Of Fi...  May 13, 2021 22:57  We refer to our letter dated May 01, 2021.   A...
7  Reliance Industries - Announcement under Regul...  May 12, 2021 21:20  We wish to inform you that the Company partici...
8  Reliance Industries - Compliances-Reg. 39 (3) ...  May 12, 2021 19:39  Pursuant to Regulation 39(3) of the Securities...
9  Reliance Industries - Compliances-Reg. 39 (3) ...  May 11, 2021 19:49  Pursuant to Regulation 39(3) of the Securities...

Upvotes: 1

Related Questions