Reputation: 900
What I want to do is to scrape the following site https://wiki.openstreetmap.org/wiki/Key:office and specifically the table containing all the tags so everything contained within:
<table class="wikitable taginfo-taglist">...<\table>
since everything within:
<div class="taglist" ...> ... <\div>
(the parent of the table) is generated by JavaScript I thought this code could work:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.add_argument("--headless")
caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
driver = webdriver.Firefox(options=options, capabilities=caps, executable_path='../statics/geckodriver')
def get_tag_soup(url):
driver.get(url)
try:
table = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME , "wikitable taginfo-taglist")))
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')
except Exception as e:
soup = e
return soup
get_tag_soup('https://wiki.openstreetmap.org/wiki/Key:office')
But when I run this code I just get an selenium.common.exceptions.TimeoutException('', None, None)
more frustratingly some times if I WebDriverWait
for the parent of "wikitable taginfo-taglist"
with EC.presence_of_element_located((By.CLASS_NAME , "taglist"))
it works.
Upvotes: 1
Views: 364
Reputation: 193058
To extract the table containing all the tags instead of presence_of_element_located() you have to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:
Using CSS_SELECTOR
:
driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.wikitable.taginfo-taglist"))).text)
Using XPATH
:
driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='wikitable taginfo-taglist']"))).text)
Console Output:
Key Value Element Description Map rendering Image Count
office accountant An office for an accountant.
6 895
1 967
14
office advertising_agency A service-based business dedicated to creating, planning, and handling advertising.
3 916
580
3
office architect An office for an architect or group of architects.
5 715
1 239
12
office association An office of a non-profit organisation, society, e.g. student, sport, consumer, automobile, bike association, etc.
13 054
3 286
50
office charity An office of a charitable organization
696
384
7
office company An office of a private company
129 801
36 951
608
office consulting An office for a consulting firm, providing expert professional advice to other companies or organisations.
1 341
162
4
office coworking An office where people can go to work (might require a fee); not limited to a single employer
1 297
320
7
office diplomatic
6 634
4 065
95
office educational_institution An office for an educational institution.
14 172
8 563
175
office employment_agency An office for an employment service.
7 300
1 771
43
office energy_supplier An office for a energy supplier.
2 237
1 112
19
office engineer An office for an engineer or group of engineers.
454
98
2
office estate_agent A place where you can rent or buy a house.
44 813
8 042
39
office financial An office of a company in the financial sector
4 891
1 588
24
office forestry A forestry office
523
741
9
office foundation An office of a foundation
1 757
542
10
office government An office of a (supra)national, regional or local government agency or department
98 289
70 569
2 300
office guide An office for tour guides, mountain guides, dive guides, etc.
587
168
1
office insurance An office at which you can take out insurance policies.
34 693
6 475
91
office it An office for an IT specialist.
9 486
2 039
51
office lawyer An office for a lawyer.
22 881
4 841
22
office logistics An office for a forwarder / hauler.
2 796
677
8
office moving_company An office which offers a relocation service.
605
252
4
office newspaper An office of a newspaper
3 511
1 450
27
office ngo An office for a non-profit, non-governmental organisation (NGO).
12 693
3 565
58
office notary An office for a notary public (common law)
3 860
548
9
office political_party An office of a political party
3 354
1 017
8
office property_management Office of a company, which manages a real estate property.
796
162
2
office quango An office of a quasi-autonomous non-governmental organisation.
366
233
4
office religion office of a community of faith
5 807
2 172
43
office research An office for research and development
3 667
4 545
348
office surveyor An office of a person doing surveys, this can be risk and damage evaluations of properties and equipment, opinion surveys or statistics.
451
109
1
office tax_advisor An office for a financial expert specially trained in tax law
5 053
823
4
office telecommunication An office for a telecommunication company
16 968
4 335
77
office visa An office of an organisation or business which offers visa assistance
95
1
0
office water_utility The office for a water utility company or water board.
743
908
20
office yes Generic tag for unspecified office type.
27 434
36 155
420
Note: Do ensure you have maximized the browser Viewport as follows:
options.add_argument("start-maximized")
Upvotes: 1