Reputation: 928
Here is the Code and sample results , I just want the first column of the table ignoring the rest. There are similar question on Stackoverflow but they did not help.
<tr>
<td>JOHNSON</td>
<td> 2,014,470 </td>
<td>0.81</td>
<td>2</td>
</tr>
I want JOHNSON only, as it is the first child. My python code is :
import requests
from bs4 import BeautifulSoup
def find_raw():
url = 'http://names.mongabay.com/most_common_surnames.htm'
r = requests.get(url)
html = r.content
soup = BeautifulSoup(html)
for n in soup.find_all('tr'):
print n.text
find_raw()
What I get:
SMITH 2,501,922 1.0061
JOHNSON 2,014,470 0.812
Upvotes: 3
Views: 7337
Reputation: 3410
Iter through tr, then print text of first td:
for tr in bs4.BeautifulSoup(data).select('tr'):
try:
print tr.select('td')[0].text
except:
pass
Or shorter:
>>> [tr.td for tr in bs4.BeautifulSoup(data).select('tr') if tr.td]
[<td>SMITH</td>, <td>JOHNSON</td>, <td>WILLIAMS</td>, <td>JONES</td>, ...]
Related posts:
Upvotes: 3
Reputation: 31484
You can find all the tr
tags with find_all
, then for each tr
you find
(gives only the first) td
. If it exists, you print it:
for tr in soup.find_all('tr'):
td = tr.find('td')
if td:
print td
Upvotes: 5