Reputation: 1077
I have this simple HTML table pulled from http://my.gwu.edu/mod/pws/courses.cfm?campId=1&termId=201501&subjId=ACCY
<tr align="center" class="tableRow1Font" >
<td>OPEN</td>
<td>80002</td>
<td>
<span style="font-weight:bold;">
ACCY
</span>
<A HREF="http://bulletin.gwu.edu/search/?P=ACCY+2001" target="_blank">
<span style="font-weight:bold;">
2001
</span>
</A>
</td>
<td>10</td>
<td>Intro Financial Accounting</td>
<td>3.00</td>
<td> Ray, K</td>
<td><a href="http://virtualtour.gwu.edu/#MON" target="_blank" >MON</a> 113</td>
<td>MW<br>12:45PM - 02:00PM</td>
<td>08/25/14 - 12/06/14</td>
<td>
</td>
</tr>
I would like to find all the tr align="center tags and then strip the td values within. I would like my code output to look like this (each td value separated by a comma on one line):
OPEN, 80002, ACCY 2001, 10, Intro to Financial Accounting, 3.00, Ray, K, MW 12:45-02:00
My code:
import bs4
import requests
response = requests.get('http://my.gwu.edu/mod/pws/courses.cfm?campId=1&termId=201501&subjId=ACCY')
soup = bs4.BeautifulSoup(response.text)
for tr in soup.findAll('tr align="center"'):
stack = []
for td in tr.findAll('td'):
stack.append(td.text.strip())
print(",".join(stack))
This is not working. How can I grab the 'td' values from only the 'tr align=center' tags?
Upvotes: 1
Views: 1480
Reputation: 26667
Inorder to retrieve the table from the given html code, it would be better to use the "class=tableRow1Font"
attribute.
the code can be written somthing like
for tr in soup.findAll('tr', class_="tableRow1Font"):
for td in tr.findAll('td'):
to use align=center
itself, you can use the attr
argument of findall
for tr in soup.findAll('tr', attr={'class':"tableRow1Font"}):
Upvotes: 2
Reputation: 19534
A quick read of the docs shows that the first param to find_all
is the name of the tag ('tr' in this case). Additional attributes need to be specified as named parameters:
>>> soup.find_all('tr', align='center')
[<tr align="center" class="tableRow1Font">
<td>OPEN</td>
<td>80002</td>
<td>
<span style="font-weight:bold;">
ACCY
</span>
<a href="http://bulletin.gwu.edu/search/?P=ACCY+2001" target="_blank">
<span style="font-weight:bold;">
2001
</span>
</a>
</td>
<td>10</td>
<td>Intro Financial Accounting</td>
<td>3.00</td>
<td> Ray, K</td>
<td><a href="http://virtualtour.gwu.edu/#MON" target="_blank">MON</a> 113</td>
<td>MW<br/>12:45PM - 02:00PM</td>
<td>08/25/14 - 12/06/14</td>
<td>
</td>
</tr>]
Alternatively, you can pass in a dict of attrs to match using the attrs
parameter:
>>> soup.find_all('tr', attrs={'align': 'center'})
This is useful for when the attribute name would be an invalid keyword name in python.
Upvotes: 1