Reputation: 1169
In one portion of my crawler, I need to scrape the published time and date in the datetime format of a youtube video. I am using bs4 and so far I can get the published time format just the way YT GUI shows to us i.e. "published on 6th may, 2017". But I cannot retrieve the actual datetime. How can I do this?
My code :
video_obj["date_published"] = video_soup.find("strong", attrs={"class": "watch-time-text"}).text
return video_obj["date_published"]
The output:
Published on Feb 8, 2020
The way I want:
YYYY-MM-DD HH:MM:SS
Upvotes: 0
Views: 1103
Reputation: 351
You could use pythons datetime to parse the String and Format the output.
pubstring = video_obj["date_published"] # "Published on Feb 8, 2020"
# pubstring[:13] cuts of first 13 chars
dt = datetime.datetime.strptime(pubstring[13:], "%b %d, %Y")
return dt.strftime("%F") # Format as needed
Upvotes: 1
Reputation: 980
Once you get:
Published on Feb 8, 2020
You can do following to remove "Published on"
date_string = soup_string.strip("Published on")
To get this in format of YYYY-MM-DD HH:MM:SS you can use python-dateutil library in python. You can install it using:
pip install python-dateutil
Code:
from dateutil import parser
formatted_date = parser.parse("Published on Feb 8, 2020", fuzzy=True)
This will output date in YYYY-MM-DD HH:MM:SS
You can read more about python-dateutil parser here
Upvotes: 1