Reputation: 979
Is there any direct way to scrape HTML table? It would be great if we give the class of HTML table and it provides the results?
For example, I need to get table for this URL
I can use this procedure but I need a clean or direct solution
Upvotes: 0
Views: 71
Reputation: 20042
Well, then try this:
import requests
import pandas as pd
url = "https://buchholz-stadtwerke.de/wasseranalyse.html"
df = pd.read_html(requests.get(url).text, flavor="bs4")
df = pd.concat(df)
df.to_csv("data.csv", index=False)
print(df)
Output:
[ Parameter Einheit Grenzwert Messwert, Februar 2020
0 Wassertemperatur °C NaN 98
1 Leitfähigkeit (25°) µS/cm 2790 302
2 Sauerstoff (elektrochem.) mg/l NaN 109
3 pH-Wert NaN 6,5 bis 9,5 806
4 Sättigungsindex NaN NaN 001
5 Karbonathärte (dH°) °dH NaN 454
6 Gesamthärte (dH°) °dH NaN 645
7 Härtebereich NaN NaN weich
8 Calcitlösekapazität mg/l 5 -01
and so on...
Also, this spits out a .csv
file with the data from the table.
EDIT:
This sort of feels like a hack, but it works. Based on the comment and the URL, you can loop over the tables from the df
and split them up in separate files.
import requests
import pandas as pd
url = "https://www.swd-ag.de/energie-wasser/wasser/trinkwasseranalyse/"
df = pd.read_html(io=requests.get(url).text, flavor="bs4")
for index, table in enumerate(df, start=1):
table.to_csv(f"table_{index}.csv", index=False)
Upvotes: 1