Reputation: 379
Trying to get the text and href for top news but not able to parse it.
website : News site
import requests
from bs4 import BeautifulSoup
def checkResponse(url):
response = requests.get(url)
if response.status_code == 200:
return response.content
else:
return None
def getTitleURL():
url = 'https://www.gujaratsamachar.com/'
response = checkResponse(url)
if response is not None:
html = BeautifulSoup(response, 'html.parser')
for values in html.find_all('div', class_='main-news'):
print(values.a.href)
if __name__ == '__main__':
print('Getting the list of names....')
names = getTitleURL()
print('... done.\n')
Output is empty
Trying to scrape the part in red:
Elements looks like this:
Upvotes: 0
Views: 123
Reputation: 11525
import requests
data = ["heading", "categorySlug", "articleUrl"]
def main(url):
r = requests.get(url).json()
for item in r['data']:
goal = [item[d] for d in data]
print(goal[0], f"{url[:31]}/news/{'/'.join(goal[1:])}")
main("https://www.gujaratsamachar.com/api/stories/5993f2835b03ab694185ad25?type=top-stories")
Upvotes: 1