Benjamin Jabl
Benjamin Jabl

Reputation: 239

Working web crawler, suddenly not working anymore

I was following this tutorial and the code worked perfectly.

Now after doing some other projects I went back and wanted to re-run the same code. Suddenly I was getting an error message that forced me to add features="html.parser" in the soup variable.

So I did, but now when I run the code, literally nothing happens. Why is that, what am I doing wrong?

I checked whether I might have uninstalled beautifulsoup4 module, but no, it is still there. I re-typed the whole code from scratch, but nothing seems to work.

import requests
from bs4 import BeautifulSoup

def spider():
    url = "https://www.amazon.de/s?k=laptop+triton&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss"
    source = requests.get(url)
    plain_text = source.text
    soup = BeautifulSoup(plain_text, features="html.parser")

    for mylink in soup.findAll('img', {'class':'s-image'}):
        mysrc = mylink.get('src')
        print(mysrc)

spider()

Ideally I'd want the crawler to print about 10-20 lines of src = "..." of the amazon page in question. This code worked a couple hours ago...

Upvotes: 0

Views: 151

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195428

The solution is to add headers={'User-Agent':'Mozilla/5.0'} to requests.get() (without it, Amazon doesn't send the correct page):

import requests
from bs4 import BeautifulSoup

def spider():
    url = "https://www.amazon.de/s?k=laptop+triton&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss"
    source = requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
    plain_text = source.text
    soup = BeautifulSoup(plain_text, features="html.parser")

    for mylink in soup.findAll('img', {'class':'s-image'}):
        mysrc = mylink.get('src')
        print(mysrc)

spider()

Prints:

https://m.media-amazon.com/images/I/71YPEDap2lL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81kXH-OA6tL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81kXH-OA6tL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81fyVgZuQxL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81kXH-OA6tL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81kXH-OA6tL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71VmlANJMOL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71rAT5E7DfL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71cEKKNfb3L._AC_UL436_.jpg
https://m.media-amazon.com/images/I/61aWXuLIEBL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71B7NyjuU9L._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81s822PQUcL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71fBKuAiQzL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71hXTUR-oRL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81-Lf6jX-OL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/81B85jUARqL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/8140E7+uhZL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/8140E7+uhZL._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71ROCddvJ2L._AC_UL436_.jpg
https://m.media-amazon.com/images/I/71ROCddvJ2L._AC_UL436_.jpg
https://m.media-amazon.com/images/I/41bB8HuoBYL._AC_UL436_.jpg

Upvotes: 1

Related Questions