Isaiah
Isaiah

Reputation: 103

Beautiful soup returning empty in PythonAnywhere

I have a bs4 app that would in this context prints the most recent post on igg-games.com
Code:

from bs4 import BeautifulSoup
import requests

def get_new():
    new = {}
    for i in BeautifulSoup(requests.get('https://igg-games.com/').text, features="html.parser").find_all('article'):
        elem = i.find('a', class_='uk-link-reset')
        new[elem.get_text()] = (elem.get('href'), ", ".join([x.get_text() for x in i.find_all('a', rel = 'category tag')]), i.find('time').get_text())
    return new
current = get_new()
new_item = list(current.items())[0]
print(f"Title: {new_item[0]}\nLink: {new_item[1][0]}\nCatagories: {new_item[1][1]}\nAdded: {new_item[1][2]}")

Output on my machine:

Title: Beholder�s Lair Free Download
Link: https://igg-games.com/beholders-lair-free-download.html
Catagories: Action, Adventure
Added: January 7, 2021

I know it works. However, my end goal is to turn this into rss feed entries. So I plugged it all into a premium PythonAnywhere container. However, my function get_new() returns {}. Is there something I need to do that I'm missing?

Upvotes: 1

Views: 219

Answers (1)

Isaiah
Isaiah

Reputation: 103

Solved thanks to the help of Dmytro O.

Since it was likely that PythonAnywhere was blocked as a client, setting the user agent allowed me to receive a response from my intended site.

#the fix
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)

when placed in my code

def get_new():
    new = {}
    for i in BeautifulSoup(requests.get('https://igg-games.com/', headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}).text, features="html.parser").find_all('article'):
        elem = i.find('a', class_='uk-link-reset')
        new[elem.get_text()] = (elem.get('href'), ", ".join([x.get_text() for x in i.find_all('a', rel = 'category tag')]), i.find('time').get_text())
    return new

This method was provided to me through this stack overflow post: How to use Python requests to fake a browser visit a.k.a and generate User Agent?

Upvotes: 2

Related Questions