zatskumar
zatskumar

Reputation: 51

Web Scraping Problem - Returns empty table

I am trying to grab the search result as a table from this website: https://www.handelsregister.de/rp_web/result.do?Page=1

but it returns an empty table. I am using this code:

from urllib.request import urlopen
from bs4 import BeautifulSoup as BS
from requests import get

url = "https://www.handelsregister.de/rp_web/result.do?Page=1"  
html = urlopen(url)

soup = BS(html, 'lxml')  
table = soup2.find_all('table')
#table = soup.find_all('table', class_ = 'RegPortErg')
#table = soup.find('table', {'class': 'RegPortErg'})
print(table)

Upvotes: 1

Views: 87

Answers (2)

chitown88
chitown88

Reputation: 28565

It's not a very clean table to parse, but you can use the requests.post():

from bs4 import BeautifulSoup as BS
import requests
import pandas as pd

url = "https://www.handelsregister.de/rp_web/mask.do?Typ=e"  


payloads = {
'suchTyp': 'e',
'registerArt': 'HRA',
'registerNummer': '',
'bundeslandBW': 'on',
'registergericht': '',
'schlagwoerter': '',
'schlagwortOptionen': '2',
'niederlassung': '',
'rechtsform': '',
'postleitzahl': '',
'ort': '',
'strasse': '',
'ergebnisseProSeite': '10',
'btnSuche': 'Find'}

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}

html = requests.post(url, data=payloads, headers=headers)

tables = pd.read_html(html.text)

table = tables[1]

Output:

print (table)
                                                    0             ...                                       4
0                                        Firma / Name             ...                                     NaN
1   Baden-Württemberg  Amtsgericht  Freiburg  HRA ...             ...                                     NaN
2                                                 NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
3    Baden-Württemberg  Amtsgericht  Ulm  HRA  726084             ...                                     NaN
4                                                 NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
5   Baden-Württemberg  Amtsgericht  Mannheim  HRA ...             ...                                     NaN
6                                                 NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
7   Baden-Württemberg  Amtsgericht  Mannheim  HRA ...             ...                                     NaN
8                                                 NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
9                                                 NaN             ...                                     NaN
10                                                NaN             ...                                     NaN
11                                                NaN             ...                                     NaN
12  Baden-Württemberg  Amtsgericht  Mannheim  HRA ...             ...                                     NaN
13                                                NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
14  Baden-Württemberg  Amtsgericht  Freiburg  HRA ...             ...                                     NaN
15                                                NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
16                                                NaN             ...                                     NaN
17                                                NaN             ...                                     NaN
18                                                NaN             ...                                     NaN
19  Baden-Württemberg  Amtsgericht  Mannheim  HRA ...             ...                                     NaN
20                                                NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
21                                                NaN             ...                                     NaN
22                                                NaN             ...                                     NaN
23  Baden-Württemberg  Amtsgericht  Stuttgart  HRA...             ...                                     NaN
24                                                NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
25                                                NaN             ...                                     NaN
26                                                NaN             ...                                     NaN
27  Baden-Württemberg  Amtsgericht  Freiburg  HRA ...             ...                                     NaN
28                                                NaN             ...              AD  CD  HD  DK  UT  VÖ  SI
29                                                NaN             ...                                     NaN
30                                                NaN             ...                                     NaN
31  Baden-Württemberg  Amtsgericht  Mannheim  HRA ...             ...                                     NaN
32                                                NaN             ...              AD  CD  HD  DK  UT  VÖ  SI

[33 rows x 5 columns]

Upvotes: 0

Mehrdad Pedramfar
Mehrdad Pedramfar

Reputation: 11073

Try this instead of html = urlopen(url):

html = urlopen(url).read()

Upvotes: 1

Related Questions