nvachhan
nvachhan

Reputation: 93

Parsing HTML and writing to CSV using Beautifulsoup - AttributeError or no html being parsed

I am either receiving an error or nothing is being parsed/written with the following code:

soup = BeautifulSoup(browser.page_source, 'html.parser')
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
rows = userinfo.find_all(attrs="value")

with open('testfile1.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(rows)

rows = userinfo.find_all(attrs="value")

AttributeError: 'ResultSet' object has no attribute 'find_all'

So I tried a for loop with print just to test it, but that returns nothing while the program runs successfully:

userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
for row in userinfo:
    rows = row.find_all(attrs="value")
    print(rows)

This is the html I am trying to parse. I am trying to return the text from the value attributes:

<div class="controlHolder">
                        <div id="usernameWrapper" class="fieldWrapper">
                            <span class="styled">Username:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbUsername" type="text" value="username" maxlength="16" id="ctl00_cleanMainPlaceHolder_tbUsername" disabled="disabled" tabindex="1" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnUserName" id="ctl00_cleanMainPlaceHolder_hdnUserName" value="AAubrey"> 
                            </div>
                        </div>
                        <div id="fullNameWrapper" class="fieldWrapper">
                            <span class="styled">Full Name:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbFullName" type="text" value="Full Name" maxlength="50" id="ctl00_cleanMainPlaceHolder_tbFullName" tabindex="2" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnFullName" id="ctl00_cleanMainPlaceHolder_hdnFullName" value="Anthony Aubrey">
                            </div>
                        </div>
                        <div id="emailWrapper" class="fieldWrapper">
                            <span class="styled">Email:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbEmail" type="text" value="[email protected]" maxlength="60" id="ctl00_cleanMainPlaceHolder_tbEmail" tabindex="3" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnEmail" id="ctl00_cleanMainPlaceHolder_hdnEmail" value="[email protected]">
                                <span id="ctl00_cleanMainPlaceHolder_validateEmail" style="color:Red;display:none;">Invalid E-Mail</span>
                            </div>
                        </div>
                        <div id="commentWrapper" class="fieldWrapper">
                            <span class="styled">Comment:</span>
                            <div class="theField">
                                <textarea name="ctl00$cleanMainPlaceHolder$tbComment" rows="2" cols="20" id="ctl00_cleanMainPlaceHolder_tbComment" tabindex="4" class="textbox longTextBox"></textarea>
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnComment" id="ctl00_cleanMainPlaceHolder_hdnComment">
                            </div>
                        </div>

Upvotes: 0

Views: 58

Answers (1)

lanceg
lanceg

Reputation: 76

Your first error stems from the fact that find_all returns a ResultSet, which is more or less a list: you would have to iterate through the elements of userinfo and call find_all on those instead.

For your second issue, I'm pretty sure when attrs is passed a string, it searches for elements with that string as its class. The html you provided contains no elements with class value, so it makes sense that nothing would get printed out. You can access an element's value with .get('value')

To print out the value of the text inputs, the following code should work. (The try/except is just so the script doesn't crash if a text input isn't found)

for field_wrapper in soup.find_all("div", attrs={"class": "fieldWrapper"}):
    try:
        print(field_wrapper.find("input", attrs={"type": "text"}).get('value'))
    except:
        continue

Upvotes: 1

Related Questions