How to grab a specific within a with BeautifulSoup

Question

Trying to grab all the names of high schools from the list of high schools in nyc wiki page.

I've written enough of the script to get me all of the info contained within the tags of the table containing the list of high schools, academic area and entrance criteria - but how can I narrow that down to what I thought would rest within td[0] (which spits back a KeyError) - just the name of the school?

Code I've written thus far:

from bs4 import BeautifulSoup
from urllib2 import urlopen

NYC = 'https://en.wikipedia.org/wiki/List_of_high_schools_in_New_York_City'

html = urlopen(NYC)
soup = BeautifulSoup(html.read(), 'lxml')
schooltable = soup.find('table')
for td in schooltable:
    print(td)

Output I receive:


    The Beacon School
    Humanities & interdisciplinary
    Academic record, interview

Output I'm seeking:

The Beacon School

alecxe · Accepted Answer

How about you get the first table on the page, iterate over all rows, except the first header one, and get the first td element for every row. Works for me:

for row in soup.table.find_all('tr')[1:]:
    print(row.td.text)

How to grab a specific <td> within a <tr> with BeautifulSoup

Answers (2)

Related Questions

How to grab a specific &lt;td&gt; within a &lt;tr&gt; with BeautifulSoup

Answers (2)

Related Questions

How to grab a specific <td> within a <tr> with BeautifulSoup