Reputation: 881
I wrote the following line of code
#!/usr/bin/python
#weather.scraper
from bs4 import BeautifulSoup
import urllib
def main():
"""weather scraper"""
r = urllib.urlopen("https://www.wunderground.com/history/airport/KPHL/2016/1/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo=&MR=1").read()
soup = BeautifulSoup(r, "html.parser")
table = soup.find_all("table", class_="responsive airport-history-summary-table")
tr = soup.find_all("tr")
td = soup.find_all("td")
print table
if __name__ == "__main__":
main()
When I print the table i get all the html (td, tr, span, etc.) as well. How can I print the content of the table (tr, td) without the html?
THANKS!
Upvotes: 0
Views: 141
Reputation: 18745
You have to use .getText()
method when you want to get a content. Since find_all
returns a list of elements, you have to choose one of them (td[0]
).
Or you can do for example:
for tr in soup.find_all("tr"):
print '>>>> NEW row <<<<'
print '|'.join([x.getText() for x in tr.find_all('td')])
The loop above prints for each row cell next to cell.
Note that you do find all td
's and all tr
's your way but you probably want to get just those in table
.
If you want to look for elements inside the table
, you have to do this:
table.find('tr')
instead of soup.find('tr)
so the BeautifulSoup
will be looking for tr
s in the table
instead of whole html
.
YOUR CODE MODIFIED (according to your comment that there are more tables):
#!/usr/bin/python
#weather.scraper
from bs4 import BeautifulSoup
import urllib
def main():
"""weather scraper"""
r = urllib.urlopen("https://www.wunderground.com/history/airport/KPHL/2016/1/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo=&MR=1").read()
soup = BeautifulSoup(r, "html.parser")
tables = soup.find_all("table")
for table in tables:
print '>>>>>>> NEW TABLE <<<<<<<<<'
trs = table.find_all("tr")
for tr in trs:
# for each row of current table, write it using | between cells
print '|'.join([x.get_text().replace('\n','') for x in tr.find_all('td')])
if __name__ == "__main__":
main()
Upvotes: 2