Elliott Weiss
Elliott Weiss

Reputation: 199

Trying to Make a Reddit image Scraper with bs4

import requests
from bs4 import BeautifulSoup as bs
import os
url = 'https://www.reddit.com/r/memes'
req = requests.get(url)
parser = bs(req.text,'html.parser')
imgs = parser.findAll('img',{"src":True})
rep = 0
print(len(imgs))
for img in imgs:
    src = img['src']
    os.chdir(r'C:\Users\ellio\Desktop\my code\mm\images')
    with open(str(rep)+'.jpg','wb') as file:
        im = requests.get(src)
        if img[alt] == 'Post image':
            rep+=1
            file.write(im.content)
    if rep == 25:
        break

This is supposed to scrape images from r/memes. When I run it, it finishes without doing anything.

Upvotes: 1

Views: 762

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

The reddit uses javascript to render the page. But you can add .json to the reddit URL and get JSON feed:

import json
import requests


url = "https://old.reddit.com/r/memes.json"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
data = requests.get(url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for ch in data["data"]["children"]:
    pic_url = ch["data"].get("url_overridden_by_dest")
    if pic_url:
        file_name = pic_url.split("/")[-1]
        if not "." in file_name:
            continue
        with open(file_name, "wb") as f_out:
            print("Downloading {}".format(pic_url))
            c = requests.get(pic_url, headers=headers).content
            f_out.write(c)

Prints and downloads the files:

Downloading https://i.redd.it/l34mx68djsr61.png
Downloading https://media4.giphy.com/media/VASgH937CSYF969Q1w/giphy.gif?cid=4d1e4f2965663c56e168c398f4dcd35f7b31f1451e568a06&rid=giphy.gif&ct=g
Downloading https://i.redd.it/oy0xme2ncsr61.jpg
Downloading https://i.redd.it/02rwljynxrr61.jpg
Downloading https://i.redd.it/aj9ste2vmsr61.jpg
Downloading https://i.redd.it/9wc4vm7nisr61.jpg
Downloading https://i.redd.it/mqfrhqnrnsr61.jpg
Downloading https://i.redd.it/hcbirqok3sr61.jpg
Downloading https://i.redd.it/da6dz6m0jsr61.jpg
Downloading https://i.redd.it/e9gtf4z29sr61.jpg
Downloading https://i.redd.it/o24odz06trr61.png
Downloading https://i.redd.it/zna7xkkncsr61.png
Downloading https://i.redd.it/j77ovgrovrr61.png
Downloading https://i.redd.it/9acir5koprr61.png
Downloading https://i.redd.it/m7th84obcsr61.png
Downloading https://i.redd.it/xh512h94zsr61.jpg
Downloading https://i.redd.it/p2zkd1opcsr61.jpg
Downloading https://i.redd.it/pfe0lgl0zrr61.jpg
Downloading https://i.redd.it/pwqg36zcesr61.jpg
Downloading https://i.redd.it/gmfzwxgywrr61.png
Downloading https://i.redd.it/9za9i15ywrr61.jpg
Downloading https://i.redd.it/h844j0658sr61.jpg
Downloading https://i.redd.it/oqeuzso2prr61.gif

Upvotes: 2

Related Questions