Reputation: 372
My code is currently pulling information from a table within a webpage, but it's only returning the value between tags. Can someone help me get the email and the name out of the tag that results from this code?
emails = []
membership_url = 'http://url/members?letter=a'
print(membership_url)
member_page = s.get(membership_url)
soup = BeautifulSoup(member_page.content, 'html5lib')
members = soup.findAll("table")[4]
tds = members.findAll("td")
print(tds)
SAMPLE OUTPUT:
<td><a href="../../options/johndoe--at--gmail.com">[email protected]</a><br/><input name="johndoe%40gmail.com_realname" size="24" type="TEXT" value="John Doe"/><input name="user" type="HIDDEN" value="johndoe%40gmail.com"/></td>
I don't know a lot about bs4 or HTML so it's lucky I got this far. Ideally, I'd like to pull out both [email protected] and the real name "John Doe". All I can get right now is the email from between the tags.
Upvotes: 0
Views: 29
Reputation: 84465
Without seeing the rest of the html here is a possibility for bs4 4.7.1 + that looks for two adjacent input
tags where the adjacent has a name
attribute with value user
. Your mileage may vary with full html. The + is an adjacent sibling combinator.
from bs4 import BeautifulSoup as bs
import requests
import urllib.parse
s = '<td><a href="../../options/johndoe--at--gmail.com">[email protected]</a><br/><input name="johndoe%40gmail.com_realname" size="24" type="TEXT" value="John Doe"/><input name="user" type="HIDDEN" value="johndoe%40gmail.com"/></td>'
soup = bs(s)
node = soup.select_one('input:has(+input[name=user])')
print(node['value'], ' ' ,urllib.parse.unquote(node['name']))
Upvotes: 2