Reputation: 533
I want to extract data from a weather site and copy it to a csv file for further analysis. I am using python and BeautifulSoup. I have been struggling in order to get the affected cities from the weather report and the values. Hier is how the HTML looks like:
> <html> <head> <meta charset="utf-8"/> </head> <body> <div
> id="main"> <div id="wettertab">
> <p>
> <strong>
> Letzte Aktualisierung: Do, 10. Aug, 18:41 Uhr
> </strong>
> </p>
> <h1 id="Hessen">
> Hessen
> </h1>
> <h2 id="Gemeinde Aarbergen">
> Gemeinde Aarbergen
> </h2>
> <table>
> <colgroup>
> <col <="" class="firstColumn" col=""/>
> <col class="colorColumn"/>
> <col class="colorColumn"/>
> <col class="colorColumn"/>
> <thead>
> <tr>
> <th>
> Schlagzeile
> </th>
> <th>
> Gültig von
> </th>
> <th>
> Gültig bis
> </th>
> <th>
> Beschreibung
> </th>
> </tr>
> </thead>
> <tr>
> <td>
> Amtliche WARNUNG vor DAUERREGEN
> </td>
> <td>
> Do, 10. Aug, 12:00 Uhr
> </td>
> <td>
> Sa, 12. Aug, 06:00 Uhr
> </td>
> <td>
> Es tritt Dauerregen mit Unterbrechungen auf. Dabei werden Niederschlagsmengen zwischen 40 l/m² und 60 l/m² erwartet.
> </td>
> </tr>
> </colgroup>
> </table>
There are four values from the tables that I need:
<tr>
<td> Amtliche WARNUNG vor DAUERREGEN
</td>
<td> Do, 10. Aug, 12:00 Uhr
</td>
<td> Sa, 12. Aug, 06:00 Uhr
</td>
<td> Es tritt Dauerregen mit Unterbrechungen auf. Dabei werden Niederschlagsmengen zwischen 40 l/m² und 60 l/m² erwartet.
</td>
</tr>
And I also need the name of the place:
<h2 id="Gemeinde Aarbergen">
Gemeinde Aarbergen
</h2>
The HTML tag for "h2" is always before the table but it dosen't belong to the table itself, as I can see.
This is my code snippet until now:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("html_warnung.html")
soup = BeautifulSoup(html, 'html.parser')
table = soup.findAll("table")
for div in table:
row = ''
rows = div.findAll('td')
for row in rows:
print(row.text)
Now I can print the values from the tables, and I can also get the city name by:
gemeinde_list = []
for gemeinde in soup.findAll('h2'):
gemeinde_list.append(gemeinde.get("id"))
What would be the best way to export all the infos togehter to csv-file, in order to have separeted values:
Gemeinde Aarbergen
Amtliche WARNUNG vor DAUERREGEN
Do, 10. Aug, 12:00 Uhr
Sa, 12. Aug, 06:00 Uhr
Es tritt Dauerregen wechselnder Intensität auf. Dabei werden Niederschlagsmengen zwischen 35 l/m² und 50 l/m² erwartet. In Staulagen werden Mengen bis 70 l/m² erreicht.
I am using Python 3.6 Please some help.
Upvotes: 0
Views: 2386
Reputation: 15376
Since neither the table or heading have any characteristic attributes, you can use the find_next_siblings
/ find_previous_siblings
methods to get neighbouring tags.
tables = soup.find_all('table')
data = []
for table in tables:
previous = table.find_previous_siblings('h2')
id = previous[0].get('id') if previous else None
rows = [td.get_text(strip=True) for td in table.find_all('td')]
data.append([id] + rows)
The data
variable is a nested list which you can now write to csv.
with open('my_file.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(data)
Upvotes: 2
Reputation: 851
You can put the data you want to save in a csv row into a tuple. Basically, assign them to a variable while you are extracting them and put all of them into a tuple. I do not fully understand the structure of the data you are extracting.
But I guess:
city_name = "Gemeinde Aarbergen"
start_date = "Do, 10. Aug, 12:00 Uhr"
end_date = "Sa, 12. Aug, 06:00 Uhr"
desc = "Es tritt Dauerregen wechselnder Intensität auf. Dabei werden Niederschlagsmengen zwischen 35 l/m² und 50 l/m² erwartet. In Staulagen werden Mengen bis 70 l/m² erreicht."
As I said I dont know what the fields are. you can name them better. Then you will have:
import csv
csv_row = (city_name, start_date, end_date, desc)
with open(filename, "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
writer.writerow(csv_row)
Hope this makes sense.
Upvotes: 0