Reputation: 109
I am trying to build a simple web scraper to capture different dialects. The code is below
from bs4 import BeautifulSoup, element
import pandas as pd
import requests
capture = requests.get('http://wiwords.com/dictionary/').text
#print(capture)
results = []
soup = BeautifulSoup(capture, 'lxml')
for words in soup.find('div', class_= 'panel-body'):
extracted = element.find('h4') # Line that causes the error
if extracted not in results:
results.append(extracted.text)
however when I run it , I get an error like " AttributeError: module 'bs4.element' has no attribute 'find' ." I saw a similar problem when search for solutions on another forum there weren't any valid answer. Any ideas as to what I could of possibly done wrong?
Upvotes: 0
Views: 513
Reputation: 116
element
with words
variable.
replace extracted = element.find('h4')
with extracted = words.find('h4')
<div class="panel-body">
<form class="navbar-form navbar-left" onsubmit="return false;" role="search">
<div class="form-group">
<select class="form-control" id="browse-country">
<option value="NONE">All Countries</option>
<option value="anguilla">Anguilla</option><option value="antigua-barbuda">Antigua & Barbuda</option><option value="aruba">Aruba</option><option value="bahamas">Bahamas</option><option value="barbados">Barbados</option><option value="belize">Belize</option><option value="bermuda">Bermuda</option><option value="british-vi">British Virgin Isles.</option><option value="cayman">Cayman Islands</option><option value="cuba">Cuba</option><option value="dominica">Dominica</option><option value="dominican-republic">Dominican Republic</option><option value="grenada">Grenada</option><option value="guadeloupe">Guadeloupe</option><option value="guyana">Guyana</option><option value="haiti">Haiti</option><option value="jamaica">Jamaica</option><option value="martinique">Martinique</option><option value="montserrat">Montserrat</option><option value="netherland-antilles">Netherland Antilles</option><option value="puerto-rico">Puerto Rico</option><option value="st-lucia">St. Lucia</option><option value="kitts-nevis">St. Kitts & Nevis</option><option value="st-vincent">St. Vincent</option><option value="st-martin">St. Martin/Maarten</option><option value="suriname">Suriname</option><option value="trinidad-tobago">Trinidad & Tobago</option><option value="turks-caicos">Turks & Caicos</option><option value="us-vi">US Virgin Islands</option><option value="venezuela">Venezuela</option> </select>
</div>
<div class="form-group">
<select class="form-control" id="browse-category">
<option value="NONE">All Categories</option>
<option value="large-up">Large up</option><option value="shout-out">Shout out</option><option value="anatomy">Anatomy</option><option value="animal">Animal</option><option value="bingy">Bingy</option><option value="bird">Bird</option><option value="clothes">Clothes</option><option value="dance">Dance</option><option value="derogatory">Derogatory</option><option value="family">Family</option><option value="folklore">Folklore</option><option value="food">Food</option><option value="fruit">Fruit</option><option value="game">Game</option><option value="insect">Insect</option><option value="money">Money</option><option value="music">Music</option><option value="mythology">Mythology</option><option value="national-symbol">National symbol</option><option value="people">People</option><option value="person">Person</option><option value="pg">Pg</option><option value="place">Place</option><option value="plant">Plant</option><option value="plants">Plants</option><option value="pq">Pq</option><option value="profanity">Profanity</option><option value="proverb">Proverb</option><option value="quality">Quality</option><option value="religion">Religion</option><option value="river">River</option><option value="sexual">Sexual</option><option value="sickness">Sickness</option><option value="similie">Similie</option><option value="superstition">Superstition</option><option value="trinidad">Trinidad</option><option value="trinidadandtobago">Trinidadandtobago</option><option value="trinidadcreole">Trinidadcreole</option><option value="vegetable">Vegetable</option><option value="weapon">Weapon</option> </select>
</div>
<button class="btn btn-default" id="browseBtn">Browse</button>
</form>
</div>
This will AttributeError. You can avoid this error by using a try-except block
soup = BeautifulSoup(capture, 'lxml')
for words in soup.find_all('div', class_= 'panel-body'):
extracted = words.find('h4')
if extracted not in results:
try:
results.append(extracted.text)
except AttributeError:
pass
You can also skip first element by using soup.find_all('div', class_= 'panel-body')[2: ]
There is also a need to double check your if statement. In your statement you are checking for an HTML element in results list rather than its text. So you might end up having duplicates. You can rewrite your if statement as:
extracted = words.find('h4').text if extracted not in results: results.append(extracted)
or use a set to remove duplicates
results = set(results)
Upvotes: 1
Reputation: 1
from bs4 import BeautifulSoup
import pandas as pd
import requests
capture = requests.get('http://wiwords.com/dictionary/').text
#print(capture)
results = []
soup = BeautifulSoup(capture, 'lxml')
for words in soup.find_all('div',attrs={'class':'panel-pody'}):
extracted = words.find_all('h4') # Line that causes the error
if extracted not in results:
results.append(extracted.text)
can you try to run your code like this
Upvotes: 0