robots.txt
robots.txt

Reputation: 137

Can't let my script produce a specific url using params

I've created a script using python to fetch only the content showing how many data are there from a webpage. When I try the link used within my script I see the result something like Showing 1-30 of 18893 (which is not what I want) but I get Showing 1-30 of 196 (expected output) when I try the link below. The bottom line is> I get success using the direct link but get something else when the script use the url which has been produced by params.

url of that site

I've tried:

import requests
from bs4 import BeautifulSoup

link = "https://www.yelp.com/search?"

params = {
    'find_desc': 'Restaurants',
    'find_loc': 'New York, NY',
    'l: p':'NY:New_York:Manhattan:Alphabet_City'
}

resp = requests.get(link,params=params)
soup = BeautifulSoup(resp.text,"lxml")
total = soup.select_one("p:contains(Showing)").text
print(total)

Getting:

Showing 1-30 of 18894

Expected output:

Showing 1-30 of 196

Moreover, the link I get using resp.url:

https://www.yelp.com/search?find_desc=Restaurants&find_loc=New+York%2C+NY&l%3A+p=NY%3ANew_York%3AManhattan%3AAlphabet_City

But the link I expect is:

https://www.yelp.com/search?find_desc=Restaurants&find_loc=New%20York%2C%20NY&l=p%3ANY%3ANew_York%3AManhattan%3AAlphabet_City

How can I make the script populate right url to the content?

Upvotes: 0

Views: 37

Answers (1)

abdusco
abdusco

Reputation: 11101

You have a typo in 'l: p':'NY:New_York:Manhattan:Alphabet_City' parameter.

It's a good idea to use urllib.parse.parse_qs and then copy the parameters, rather than trying to decode it yourself.

Here's the fixed version:

import requests
from bs4 import BeautifulSoup

link = "https://www.yelp.com/search"

params = {
    'find_desc': 'Restaurants',
    'find_loc': 'New York, NY',
    'l': 'p:NY:New_York:Manhattan:Alphabet_City'
}

res = requests.get(link,params=params)
soup = BeautifulSoup(res.text, 'html.parser')
print(res.url)
total = soup.select_one("p:contains(Showing)").text
print(total)

output:

https://www.yelp.com/search?find_desc=Restaurants&find_loc=New+York%2C+NY&l=p%3ANY%3ANew_York%3AManhattan%3AAlphabet_City
Showing 1-30 of 196

Upvotes: 1

Related Questions