Triliang123
Triliang123

Reputation: 73

Why is OUTPUT empty after scraping web site?

Can website block python script to scan values from them (via BeautifulSoup)?

I use this script

import gspread
import requests
from bs4 import BeautifulSoup

URL = 'https://www.sreality.cz/hledani/prodej/byty/praha?velikost=1%2Bkk'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}
response = requests.get(URL, headers=headers)

#Scraping webu eurobydleni.cz
results = soup.find_all('div', attrs={'class':'text-wrap'})
for job in results:

    nemovitost = job.find('span', attrs={'class':'name ng-binding'})
    nemovitost_final = nemovitost.text.strip()

    print(nemovitost_final)

But OUTPUT is nothing. Script start then ends quickly.

I need print what is in <span class="name ng-binding">Prodej bytu 1+kk 33&nbsp;m²</span>

So OUTPUT= 'Prodej bytu 1+kk', 'Prodej bytu 1+kk', others...

Edit: Use help from @Andrej Kesely:

I try our code (in my code to insert values to Google Sheet), but I got error.

import gspread
import requests
import datetime 
import json
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
from pprint import pprint
from datetime import timedelta
import time

datetime.datetime.now()

scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]

api_url = 'https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_sub_cb=2&category_type_cb=1&locality_region_id=10&per_page=20'
data = requests.get(api_url).json()

#komuniakce s Excelem
data = ServiceAccountCredentials.from_json_keyfile_name("data.json", scope)
client = gspread.authorize(data)
sheet = client.open("skript").worksheet('sreality.cz')
data = sheet.get_all_records()

#zapis do LOG
sheet2 = client.open("skript").worksheet('LOG')
data = sheet2.get_all_records()

insertRow = ["sreality.cz", "START: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
sheet2.insert_row(insertRow,2)

for estate in data["_embedded"]["estates"]:

    insertRow = ["{:<30} {:<30} {} {}".format(estate["name"], estate["price"], estate["locality"])]
    sheet.insert_row(insertRow,2)

insertRow = ["sreality.cz", "KONEC: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
sheet2.insert_row(insertRow,2)
time.sleep(60)

Error:

Traceback (most recent call last):
  File "c:/Skola-Projekty/python/byt/sreality.cz.py", line 34, in <module>
    for estate in data["_embedded"]["estates"]:
TypeError: list indices must be integers or slices, not str
PS C:\Skola-Projekty\python\byt> 

Edit2: Use help from @Andrej Kesely:

I use code, but it not split line into column. This code get all data into one line, then go to another line. I need them split into 3 column, is there way to do that with your code, please?

OUTPUT in Google sheet:

Flat:                                                           Price Address
Prodej bytu 1+kk 23 m²  2827000  Římská, Praha 2 - Vinohrady
Prodej bytu 1+kk 27 m²  4049000  Ječná, Praha 2 - Nové Město
Prodej bytu 1+kk 33 m²  6005000  Záhřebská, Praha 2 - Vinohrady

I need:

Flat:                           Price:             Address:
Prodej bytu 1+kk 23 m²          2827000            Římská, Praha 2 - Vinohrady
Prodej bytu 1+kk 27 m²          4049000            Ječná, Praha 2 - Nové Město
Prodej bytu 1+kk 33 m²          6005000            Záhřebská, Praha 2 - Vinohrady

Upvotes: 1

Views: 168

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195408

The data is loaded via Ajax from an external URL. You can use this example of how to load the data:

import json
import requests


api_url = "https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_sub_cb=2&category_type_cb=1&locality_region_id=10&per_page=20"
data = requests.get(api_url).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for estate in data["_embedded"]["estates"]:
    print("{:<30} {}".format(estate["name"], estate["price"]))

Prints:

Prodej bytu 1+kk 33 m²         4809347
Prodej bytu 1+kk 32 m²         5493000
Prodej bytu 1+kk 44 m²         6167000
Prodej bytu 1+kk 23 m²         2896000
Prodej bytu 1+kk 26 m²         3320000
Prodej bytu 1+kk 20 m²         2715000
Prodej bytu 1+kk 36 m²         3600000
Prodej bytu 1+kk 44 m²         4770000
Prodej bytu 1+kk 18 m²         3850000
Prodej bytu 1+kk 33 m²         5226000
Prodej bytu 1+kk 15 m²         2950000
Prodej bytu 1+kk 15 m²         2950000
Prodej bytu 1+kk 15 m²         2950000
Prodej bytu 1+kk 36 m²         5248000
Prodej bytu 1+kk 22 m²         3990000
Prodej bytu 1+kk 80 m²         6300000
Prodej bytu 1+kk 46 m²         6394000
Prodej bytu 1+kk 33 m²         3469000
Prodej bytu 1+kk 39 m²         5099000
Prodej bytu 1+kk 32 m²         4250000
Prodej bytu 1+kk 30 m²         4759000

Upvotes: 1

Triliang123
Triliang123

Reputation: 73

In edit2 I ask for help to split values into columns, so there is my final solution:

insertRow = ['sreality.cz', "{:<30}".format(estate["name"]), "{:<30}".format(estate["locality"]), "{:<30}".format(estate["price"]), str(pocet_bytu)]
    sheet.insert_row(insertRow,2)

Thanks for @Andrej Kesely for help!

Upvotes: 1

Related Questions