Reputation: 51
I am trying to grab the search result as a table from this website: https://www.handelsregister.de/rp_web/result.do?Page=1
but it returns an empty table. I am using this code:
from urllib.request import urlopen
from bs4 import BeautifulSoup as BS
from requests import get
url = "https://www.handelsregister.de/rp_web/result.do?Page=1"
html = urlopen(url)
soup = BS(html, 'lxml')
table = soup2.find_all('table')
#table = soup.find_all('table', class_ = 'RegPortErg')
#table = soup.find('table', {'class': 'RegPortErg'})
print(table)
Upvotes: 1
Views: 87
Reputation: 28565
It's not a very clean table to parse, but you can use the requests.post()
:
from bs4 import BeautifulSoup as BS
import requests
import pandas as pd
url = "https://www.handelsregister.de/rp_web/mask.do?Typ=e"
payloads = {
'suchTyp': 'e',
'registerArt': 'HRA',
'registerNummer': '',
'bundeslandBW': 'on',
'registergericht': '',
'schlagwoerter': '',
'schlagwortOptionen': '2',
'niederlassung': '',
'rechtsform': '',
'postleitzahl': '',
'ort': '',
'strasse': '',
'ergebnisseProSeite': '10',
'btnSuche': 'Find'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
html = requests.post(url, data=payloads, headers=headers)
tables = pd.read_html(html.text)
table = tables[1]
Output:
print (table)
0 ... 4
0 Firma / Name ... NaN
1 Baden-Württemberg Amtsgericht Freiburg HRA ... ... NaN
2 NaN ... AD CD HD DK UT VÖ SI
3 Baden-Württemberg Amtsgericht Ulm HRA 726084 ... NaN
4 NaN ... AD CD HD DK UT VÖ SI
5 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
6 NaN ... AD CD HD DK UT VÖ SI
7 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
8 NaN ... AD CD HD DK UT VÖ SI
9 NaN ... NaN
10 NaN ... NaN
11 NaN ... NaN
12 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
13 NaN ... AD CD HD DK UT VÖ SI
14 Baden-Württemberg Amtsgericht Freiburg HRA ... ... NaN
15 NaN ... AD CD HD DK UT VÖ SI
16 NaN ... NaN
17 NaN ... NaN
18 NaN ... NaN
19 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
20 NaN ... AD CD HD DK UT VÖ SI
21 NaN ... NaN
22 NaN ... NaN
23 Baden-Württemberg Amtsgericht Stuttgart HRA... ... NaN
24 NaN ... AD CD HD DK UT VÖ SI
25 NaN ... NaN
26 NaN ... NaN
27 Baden-Württemberg Amtsgericht Freiburg HRA ... ... NaN
28 NaN ... AD CD HD DK UT VÖ SI
29 NaN ... NaN
30 NaN ... NaN
31 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
32 NaN ... AD CD HD DK UT VÖ SI
[33 rows x 5 columns]
Upvotes: 0
Reputation: 11073
Try this instead of html = urlopen(url)
:
html = urlopen(url).read()
Upvotes: 1