Michail
Michail

Reputation: 37

Using Python to use a website's search function

I am trying to use a search function of a website with this code structure:

<div class='search'>
<div class='inner'>
<form accept-charset="UTF-8" action="/gr/el/products" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="&#x2713;" /></div>
<label for='query'>Ενδιαφέρομαι για...</label>
<fieldset>
<input class="search-input" data-search-url="/gr/el/products/autocomplete.json" id="text_search" name="query" placeholder="Αναζητήστε προϊόν" type="text" />
<button type='submit'>Αναζήτηση</button>
</fieldset>
</form>
</div>
</div>

with this python script:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'}



payload = {
    'query':'test'
}

r = requests.get('http://www.pharmacy295.gr',data = payload ,headers = headers)

soup = BeautifulSoup(r.text,'lxml')
products = soup.findAll('span', {'class':'name'})
print(products)

This code came as a result of extensive searches on this website on how to do this task, however I never seem to manage to get any search results - just the main page of the website.

Upvotes: 1

Views: 7495

Answers (4)

Shashwat Agrawal
Shashwat Agrawal

Reputation: 125

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'}

payload = {
    'query':'test',

}

r = requests.get('http://www.pharmacy295.gr/products',data = payload ,headers = headers)

soup = BeautifulSoup(r.text,'lxml')
products = soup.findAll('span', {'class':'name'})
print(products)

Upvotes: 0

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Add products to your url and it will work fine, the method is get in the form and the form shows also the url. If you are unsure crack open use the developer console on firefox or chrome you can see exactly how the the request is made

payload = {
    'query':'neutrogena',

}

r = requests.get('http://www.pharmacy295.gr/products',data = payload ,headers = headers)

soup = BeautifulSoup(r.text,'lxml')
products = soup.findAll('span', {'class':'name'})
print(products)

Output:

[<span class="name">NEUTROGENA - Hand &amp; Nail Cream - 75ml</span>, <span class="name">NEUTROGENA - Hand Cream (Unscented) - 75ml</span>, <span class="name">NEUTROGENA - PROMO PACK 1+1 \u0394\u03a9\u03a1\u039f  Lip Moisturizer - 4,8gr</span>, <span class="name">NEUTROGENA - Lip Moisturizer with Nordic Berry - 4.9gr</span>]

Also if you prefer you can get the data as json:

In [13]: r = requests.get('http://www.pharmacy295.gr/el/products/autocomplete.json',data = payload ,headers = headers)

In [14]: print(r.json())
[{u'title': u'NEUTROGENA - Hand & Nail Cream - 75ml', u'discounted_price': u'5,31 \u20ac', u'photo': u'/system/uploads/asset/data/12584/tiny_108511.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/7547', u'price': u'8,17 \u20ac'}, {u'title': u'NEUTROGENA - Hand Cream (Unscented) - 75ml', u'discounted_price': u'4,03 \u20ac', u'photo': u'/system/uploads/asset/data/4689/tiny_102953.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/3958', u'price': u'6,20 \u20ac'}, {u'title': u'NEUTROGENA - PROMO PACK 1+1 \u0394\u03a9\u03a1\u039f  Lip Moisturizer - 4,8gr', u'discounted_price': u'3,91 \u20ac', u'photo': u'/system/uploads/asset/data/5510/tiny_118843.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/4644', u'price': u'4,60 \u20ac'}, {u'title': u'NEUTROGENA - Lip Moisturizer with Nordic Berry - 4.9gr', u'discounted_price': u'2,91 \u20ac', u'photo': u'/system/uploads/asset/data/12761/tiny_126088.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/7548', u'price': u'4,48 \u20ac'}]

Upvotes: 4

mhawke
mhawke

Reputation: 87084

Firstly the URL is wrong. You are using http://www.pharmacy295.gr but you should be using http://www.pharmacy295.gr/gr/el/products. This URL can actually be simplified to http://www.pharmacy295.gr/products.

Also are making a GET request so, rather than data=payload, try params=payload.

data is for POST requests.

Here is the documentation for requests.get().

Upvotes: 1

vlad-ardelean
vlad-ardelean

Reputation: 7622

Do a r = requests.post('http://www.pharmacy295.gr',data = payload ,headers = headers)

GET requests also ignore the data...

Upvotes: 1

Related Questions