Reputation: 65
so I have been trying to scrape the electoral votes of all the presidents that have won the US presidential election from the large table on this page.
Here is the code I have been trying to use:
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas
# using selenium and shromedriver to extract the javascript wikipage
scrape_options = Options()
scrape_options.add_argument('--headless')
driver = webdriver.Chrome(r'web scraping master/chromedriver', options=scrape_options)
page_info = driver.get('https://en.wikipedia.org/wiki/United_States_presidential_election')
# waiting for the javascript to load
try:WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter")))
finally:
page = driver.page_source
soup = BeautifulSoup(page, 'lxml')
table = soup.find('table', {'class': 'wikitable sortable
jquerytablesorter'})
#print(table)
rows=table.find_all('tr')
The code works find up to this point. here is the part of the code that is supposed to get the information i need.
for row in rows:
need=row.find_all('td')
for n in need:
try:
if len(n.find('b')==0):
continue
else:
if nek.find('b').find('sup'):
continue
electoral_votes=n.find('span',{'style':"position: relative margin: 0
0.3em;"}).get_text()
print(electoral_votes)
except:continue
After running this part of the code, the code does not return anything i need.
can someone help me out?
I'd be so greatfull
Upvotes: 0
Views: 157
Reputation: 84465
trying to scrape the electoral votes of all the presidents that have won the US presidential election
As you want all the presidential candidates who became presidents (we will throw in Joe Biden though he is president elect at time of writing 28/11/2020; you can easily remove), I chose a method which loops the table rows.
The table rows are deliberately restricted by a particular css selector to compensate for the table being irregular, and to pick up only the bold winners in the presidential candidate column. I chose this level so I can go on to select the various child elements to populate my output; in the format {year:[winner, vote],.....}
.
I use an attribute selector, with contains (*
) operator, to target the year of interest by the title
attribute containing the string 'United States presidential election'
; I use a further css selector to get the winner (who has bold highlighting); I use regex to pull out the votes from the text of the tr
element.
Py
from bs4 import BeautifulSoup as bs
import requests,re
soup = bs(requests.get('https://en.wikipedia.org/wiki/United_States_presidential_election').text, 'lxml')
presidential_wins_by_year = {
int(i.select_one('[title*="United States presidential election"]').text): #year
[i.select_one('td[rowspan] ~ td:nth-of-type(3) b a').text.strip(), # winner candidate
re.search('(\d+\s?\/\s?\d+)', i.text).groups(0)[0] #votes
]
for i in soup.select('.sortable tr:has(td[rowspan] ~ td:nth-of-type(3) b a)')
}
print(presidential_wins_by_year)
Sample output:
Upvotes: 1
Reputation: 28565
You can just use pandas to read in the html. This will return all the tables into a list. It's just a matter of pulling out the table you're interested in:
Code:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/United_States_presidential_election'
dfs = pd.read_html(url)
Output:
print(dfs[2].head(20).to_string())
Year Party Presidential candidate Vice presidential candidate Popular vote % Electoral votes Notes
0 1788 Independent George Washington None[note 3] 43782 100.0 69 / 138 NaN
1 1788 Federalist John Adams[note 4] None[note 3] NaN NaN 34 / 138 NaN
2 1788 Federalist John Jay None[note 3] NaN NaN 9 / 138 NaN
3 1788 Federalist Robert H. Harrison None[note 3] NaN NaN 6 / 138 NaN
4 1788 Federalist John Rutledge None[note 3] NaN NaN 6 / 138 NaN
5 1788 Federalist John Hancock None[note 3] NaN NaN 4 / 138 NaN
6 1788 Anti-Administration George Clinton None[note 3] NaN NaN 3 / 138 NaN
7 1788 Federalist Samuel Huntington None[note 3] NaN NaN 2 / 138 NaN
8 1788 Federalist John Milton None[note 3] NaN NaN 2 / 138 NaN
9 1788 Federalist James Armstrong None[note 3] NaN NaN 1 / 138 NaN
10 1788 Federalist Benjamin Lincoln None[note 3] NaN NaN 1 / 138 NaN
11 1788 Anti-Administration Edward Telfair None[note 3] NaN NaN 1 / 138 NaN
12 1792 Independent George Washington None[note 3] 28579 100.0 132 / 264 NaN
13 1792 Federalist John Adams[note 4] None[note 3] NaN NaN 77 / 264 NaN
14 1792 Democratic-Republican George Clinton None[note 3] NaN NaN 50 / 264 NaN
15 1792 Democratic-Republican Thomas Jefferson None[note 3] NaN NaN 4 / 264 NaN
16 1792 Democratic-Republican Aaron Burr None[note 3] NaN NaN 1 / 264 NaN
17 1796 Federalist John Adams None[note 3] 35726 53.4 71 / 276 NaN
18 1796 Democratic-Republican Thomas Jefferson[note 5] None[note 3] 31115 46.6 68 / 276 NaN
19 1796 Federalist Thomas Pinckney None[note 3] NaN NaN 59 / 276 NaN
Upvotes: 1