Reputation: 103
I have a bs4 app that would in this context prints the most recent post on igg-games.com
Code:
from bs4 import BeautifulSoup
import requests
def get_new():
new = {}
for i in BeautifulSoup(requests.get('https://igg-games.com/').text, features="html.parser").find_all('article'):
elem = i.find('a', class_='uk-link-reset')
new[elem.get_text()] = (elem.get('href'), ", ".join([x.get_text() for x in i.find_all('a', rel = 'category tag')]), i.find('time').get_text())
return new
current = get_new()
new_item = list(current.items())[0]
print(f"Title: {new_item[0]}\nLink: {new_item[1][0]}\nCatagories: {new_item[1][1]}\nAdded: {new_item[1][2]}")
Output on my machine:
Title: Beholder�s Lair Free Download
Link: https://igg-games.com/beholders-lair-free-download.html
Catagories: Action, Adventure
Added: January 7, 2021
I know it works. However, my end goal is to turn this into rss feed entries. So I plugged it all into a premium PythonAnywhere container. However, my function get_new() returns {}. Is there something I need to do that I'm missing?
Upvotes: 1
Views: 219
Reputation: 103
Solved thanks to the help of Dmytro O.
Since it was likely that PythonAnywhere was blocked as a client, setting the user agent allowed me to receive a response from my intended site.
#the fix
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
when placed in my code
def get_new():
new = {}
for i in BeautifulSoup(requests.get('https://igg-games.com/', headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}).text, features="html.parser").find_all('article'):
elem = i.find('a', class_='uk-link-reset')
new[elem.get_text()] = (elem.get('href'), ", ".join([x.get_text() for x in i.find_all('a', rel = 'category tag')]), i.find('time').get_text())
return new
This method was provided to me through this stack overflow post: How to use Python requests to fake a browser visit a.k.a and generate User Agent?
Upvotes: 2