Reputation: 93
I am either receiving an error or nothing is being parsed/written with the following code:
soup = BeautifulSoup(browser.page_source, 'html.parser')
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
rows = userinfo.find_all(attrs="value")
with open('testfile1.csv', 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(rows)
rows = userinfo.find_all(attrs="value")
AttributeError: 'ResultSet' object has no attribute 'find_all'
So I tried a for loop with print just to test it, but that returns nothing while the program runs successfully:
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
for row in userinfo:
rows = row.find_all(attrs="value")
print(rows)
This is the html I am trying to parse. I am trying to return the text from the value attributes:
<div class="controlHolder">
<div id="usernameWrapper" class="fieldWrapper">
<span class="styled">Username:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbUsername" type="text" value="username" maxlength="16" id="ctl00_cleanMainPlaceHolder_tbUsername" disabled="disabled" tabindex="1" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnUserName" id="ctl00_cleanMainPlaceHolder_hdnUserName" value="AAubrey">
</div>
</div>
<div id="fullNameWrapper" class="fieldWrapper">
<span class="styled">Full Name:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbFullName" type="text" value="Full Name" maxlength="50" id="ctl00_cleanMainPlaceHolder_tbFullName" tabindex="2" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnFullName" id="ctl00_cleanMainPlaceHolder_hdnFullName" value="Anthony Aubrey">
</div>
</div>
<div id="emailWrapper" class="fieldWrapper">
<span class="styled">Email:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbEmail" type="text" value="[email protected]" maxlength="60" id="ctl00_cleanMainPlaceHolder_tbEmail" tabindex="3" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnEmail" id="ctl00_cleanMainPlaceHolder_hdnEmail" value="[email protected]">
<span id="ctl00_cleanMainPlaceHolder_validateEmail" style="color:Red;display:none;">Invalid E-Mail</span>
</div>
</div>
<div id="commentWrapper" class="fieldWrapper">
<span class="styled">Comment:</span>
<div class="theField">
<textarea name="ctl00$cleanMainPlaceHolder$tbComment" rows="2" cols="20" id="ctl00_cleanMainPlaceHolder_tbComment" tabindex="4" class="textbox longTextBox"></textarea>
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnComment" id="ctl00_cleanMainPlaceHolder_hdnComment">
</div>
</div>
Upvotes: 0
Views: 58
Reputation: 76
Your first error stems from the fact that find_all
returns a ResultSet, which is more or less a list: you would have to iterate through the elements of userinfo
and call find_all
on those instead.
For your second issue, I'm pretty sure when attrs
is passed a string, it searches for elements with that string as its class. The html you provided contains no elements with class value
, so it makes sense that nothing would get printed out. You can access an element's value with .get('value')
To print out the value of the text inputs, the following code should work. (The try/except is just so the script doesn't crash if a text input isn't found)
for field_wrapper in soup.find_all("div", attrs={"class": "fieldWrapper"}):
try:
print(field_wrapper.find("input", attrs={"type": "text"}).get('value'))
except:
continue
Upvotes: 1