Jayanth Yetukuri
Jayanth Yetukuri

Reputation: 1

python request get not fetching the complete data

I am trying to scrape the 50 best albums of the decade(2000-2009) from https://www.pastemagazine.com/blogs/lists/2009/11/the-best-albums-of-the-decade.html?a=1 .

I am using the following code in python:

from requests import get 
url = 'https://www.pastemagazine.com/blogs/lists/2009/11/the-best-albums-of-the-decade.html?a=2'
response = get(url) 
print(response.text)

When I view the response, the information for all the 50 best albums is missing from the output. When I view the page source, I do see this information under the <div class="grid-x article-wrapper">. What do I need to do in order to scrape this part of the webpage?​

Upvotes: 0

Views: 2026

Answers (1)

SIM
SIM

Reputation: 22440

You need to define a header to make it more like a real browser. The following should work.

import requests
from bs4 import BeautifulSoup

url = 'https://www.pastemagazine.com/blogs/lists/2009/11/the-best-albums-of-the-decade.html?a=2'

res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"}) 
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("b.big > b"):
    print(item.text)

Output are like:

50. Björk: Vespertine [Elektra] 2001
49. Libertines: Up The Bracket [Rough Trade] (2002)
48. Loretta Lynn: Van Lear Rose [Interscope] (2004)
47. Arctic Monkeys: Whatever People Say I Am, That’s What I’m Not [Domino] (2006)

Upvotes: 1

Related Questions