Abdur
Abdur

Reputation: 27

unable to extract table data using beautifulsoup

I have been trying to extract data from this table and having a lot of difficulty. any tips/help is appreciated http://finance.yahoo.com/options/lists/?mod_id=mediaquotesoptions&tab=tab2&rcnt=50

my code is as follows

import urllib
import urllib.request

from bs4 import BeautifulSoup
import pandas as pd
def make_soup(url):
    thepage=urllib.request.urlopen(url)
    soupdata=BeautifulSoup(thepage, "html.parser")
    return soupdata

soup=make_soup('http://finance.yahoo.com/options/lists/?mod_id=mediaquotesoptions&tab=tab2&rcnt=50')
soup.findAll('td')

Thanks

Upvotes: 1

Views: 2021

Answers (2)

hipoglucido
hipoglucido

Reputation: 545

In order to do this type of tasks, a little bit of html code exploration is always needed. I use element inspector of Chrome. In order to download the source you have many options. The easiest is to download it manually using a regular browser (which I'd recommend if you don't need many htmls). If you want to download it from Python I recommend using Selenium, which by default waits until the scripts of the webpage are executed.

import bs4
import pandas as pd

def cleanList(lst):
    #This function removes breakline elements from list lst
    return [i for i in lst if i !='\n']
url='http://finance.yahoo.com/options/lists/?mod_id=mediaquotesoptions&tab=tab2&rcnt=50'
#driver_path is the directory where you have your driver
browser=webdriver.Chrome(driver_path)       
browser.get(url)
soup = bs4.BeautifulSoup(browser.page_source, 'lxml')
#The target table is the third returned by find_all('table')
rows=soup.find_all('table')[3].find_all('tr')
#Fields are in the first row
fields=cleanList(rows[0].find_all(text=True))
#We will store each row as a dict in a list
elements=[]
for row in rows[1:]:
    values=cleanList(row.find_all(text=True))
    #Some elements that belong to the same cell are separated so the code gets a bit uglier
    values[4]='|'.join(values[4:6])
    del values[5]
    values[6]='|'.join(values[7:9])
    del values[7]
    values[8]='|'.join(values[8:10])
    del values[9]
    element=dict((f,str(v)) for f,v in zip(fields,values))
    elements.append(element)
#concat results
df=pd.DataFrame(elements)

This is the resulting DataFrame: table_yahoo.csv

Upvotes: 0

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

The content is inside the script tag so there is no table in the source you get back, it is dynamically created, if you look at the actual source not in developer tools or firebug for the script tag starting with:

<script>YMedia.use("media-rmp", "media-viewport-loader", function(Y){Y.Global.Media.ViewportLoader.addContainers([{selector: "#mediaquotesoptions_container",callback: 

You will see the all the data is inside that.

You could try to parse the script tag that contains data or a more reliable way would be to use selenium, you can combine it with PhantomJS for headless browsing:

from selenium import webdriver
dr = webdriver.PhantomJS()
dr.get("http://finance.yahoo.com/options/lists/?mod_id=mediaquotesoptions&tab=tab2&rcnt=50")
table = dr.find_element_by_css_selector("div.yfi-panel-container.tabview-content")
headers = ",".join([th.text for th in table.find_elements_by_xpath(".//th/span")])
print(headers)
for row in table.find_elements_by_xpath(".//tr"):
    print(",".join([td.text.replace("\n","") for td in row.find_elements_by_xpath(".//td")]))

Which if we run gives:

In [22]: print(headers)
TICKER,OPTION SYMBOL,OPTION,CLOSE,CHANGE,VOLUME,VOLUMEN CHANGE,OPEN INTEREST,OPEN INTEREST CHANGE

In [23]: for row in table.find_elements_by_xpath(".//tr"):
   ....:         print(",".join([td.text.replace("\n","") for td in row.find_elements_by_xpath(".//td")]))
   ....:     

XLF,XLF160819P00023000,AUG 16 23 Put,0.60,+0.13(+27.70%),184,769,+184,719(+369438.00%),29,504,+24(+0.80%)
EMC,EMC160715P00023000,JUL 16 23 Put,0.25,-0.10(-28.60%),13,003,+12,997(+216616.70%),256,844,00.00%
KATE,KATE161021C00021000,OCT 16 21 Call,2.35,-0.30(-11.30%),7,012,+7,006(+116766.70%),334,+6(+22.20%)
EFA,EFA160617P00057000,JUN 16 57 Put,0.16,-0.11(-40.70%),10,565,+10,555(+105550.00%),197,134,+50.00%
CIEN,CIEN161021C00023000,OCT 16 23 Call,1.21,+0.38(+45.80%),39,838,+39,799(+102048.70%),884,+13(+17.30%)
GLD,GLD160715P00109000,JUL 16 109 Put,0.17,-0.22(-56.40%),6,484,+6,477(+92528.60%),14,014,+5(+0.40%)
XME,XME160617P00020500,JUN 16 20.5 Put,0.13,-0.10(-43.50%),10,054,+10,041(+77238.50%),102,974,00.00%
ATVI,ATVI160617C00040000,JUN 16 40 Call,0.19,-0.19(-50.00%),20,429,+20,401(+72860.70%),75,224,+7(+0.10%)
GDX,GDX160617C00021500,JUN 16 21.5 Call,3.95,+2.30(+139.40%),6,046,+6,037(+67077.80%),142,944,+10.00%
ALXN,ALXN160617C00160000,JUN 16 160 Call,2.20,-1.00(-31.30%),3,728,+3,722(+62033.30%),4,734,-5(-1.00%)
XOP,XOP160715P00034000,JUL 16 34 Put,1.09,+0.11(+11.20%),4,081,+4,074(+58200.00%),83,414,+6(+0.10%)
ABX,ABX160819C00020000,AUG 16 20 Call,1.49,+0.87(+140.30%),3,047,+3,041(+50683.30%),974,+6(+6.60%)
ETE,ETE160715C00016000,JUL 16 16 Call,0.75,-0.10(-11.80%),10,137,+10,116(+48171.40%),58,684,+20(+0.30%)
SLV,SLV160916C00018000,SEP 16 18 Call,0.28,+0.07(+33.30%),4,529,+4,519(+45190.00%),16,014,00.00%
AAL,AAL160617P00031000,JUN 16 31 Put,0.88,+0.37(+72.50%),13,147,+13,117(+43723.30%),10,204,-4(-0.40%)
FXE,FXE160715C00111000,JUL 16 111 Call,1.25,+0.73(+140.40%),3,058,+3,051(+43585.70%),9,914,-1(-0.10%)
XLU,XLU160617C00050000,JUN 16 50 Call,0.54,+0.29(+116.00%),15,493,+15,456(+41773.00%),185,764,-60.00%
GM,GM160715P00027000,JUL 16 27 Put,0.30,+0.08(+36.40%),6,675,+6,657(+36983.30%),1,534,+18(+13.30%)
XBI,XBI160617P00057500,JUN 16 57.5 Put,1.30,+0.55(+73.30%),2,181,+2,175(+36250.00%),16,234,+1,265(+353.40%)
HAL,HAL170120P00035000,JAN 17 35 Put,1.40,-0.14(-9.10%),2,136,+2,130(+35500.00%),78,844,00.00%
SLCA,SLCA160617C00031000,JUN 16 31 Call,1.50,+0.85(+130.80%),7,057,+7,037(+35185.00%),1,094,+20(+22.50%)
DD,DD160617C00070000,JUN 16 70 Call,0.56,+0.13(+30.20%),3,838,+3,825(+29423.10%),44,324,-3(-0.10%)
FXI,FXI160715C00034000,JUL 16 34 Call,0.82,-0.01(-1.20%),20,356,+20,279(+26336.40%),135,884,+42(+0.30%)
ALK,ALK160715P00060000,JUL 16 60 Put,0.95,+0.20(+26.70%),1,553,+1,547(+25783.30%),4,564,00.00%
SRPT,SRPT160617C00026000,JUN 16 26 Call,1.95,0.000.00%,3,073,+3,061(+25508.30%),1,354,-5(-3.60%)
NUE,NUE160617P00048000,JUN 16 48 Put,0.26,-0.24(-48.00%),6,040,+6,016(+25066.70%),9,604,-20(-2.00%)
VLO,VLO160715P00050000,JUL 16 50 Put,0.58,+0.08(+16.00%),6,023,+5,999(+24995.80%),4,124,+12(+3.00%)
EA,EA170120P00062500,JAN 17 62.5 Put,2.57,+0.07(+2.80%),1,505,+1,499(+24983.30%),2,344,+6(+2.60%)
JOY,JOY160715C00025000,JUL 16 25 Call,0.33,+0.01(+3.10%),2,436,+2,426(+24260.00%),1,664,+3(+1.80%)
DXJ,DXJ160715C00044780,JUL 16 44.78 Call,0.38,-0.16(-29.60%),6,676,+6,647(+22920.70%),347,114,+28(+0.10%)
AVGO,AVGO170120C00170000,JAN 17 170 Call,13.80,+3.00(+27.80%),1,594,+1,587(+22671.40%),41,784,+5(+0.10%)
LVS,LVS160715P00046000,JUL 16 46 Put,1.72,-0.03(-1.70%),3,392,+3,377(+22513.30%),12,214,+10(+0.80%)
T,T160916C00039000,SEP 16 39 Call,1.15,+0.25(+27.80%),4,179,+4,160(+21894.70%),1,174,+18(+18.20%)
EWZ,EWZ160916P00027000,SEP 16 27 Put,2.20,-0.46(-17.30%),2,463,+2,451(+20425.00%),249,804,00.00%
CF,CF160617P00030000,JUN 16 30 Put,1.85,-0.53(-22.30%),2,003,+1,993(+19930.00%),62,384,00.00%
NEM,NEM160715C00035000,JUL 16 35 Call,2.42,+1.38(+132.70%),1,286,+1,279(+18271.40%),24,514,00.00%
EBAY,EBAY160617P00024500,JUN 16 24.5 Put,0.72,+0.14(+24.10%),7,010,+6,971(+17874.40%),624,+29(+87.90%)
WWAV,WWAV160617C00047500,JUN 16 47.5 Call,0.45,+0.05(+12.50%),1,760,+1,750(+17500.00%),19,654,+10(+0.50%)
UUP,UUP160916C00025000,SEP 16 25 Call,0.19,-0.14(-42.40%),3,609,+3,588(+17085.70%),160,754,+13(+0.10%)
AEM,AEM160617P00047000,JUN 16 47 Put,0.53,-2.17(-80.40%),3,426,+3,406(+17030.00%),3,364,00.00%
IWM,IWM160617C00120000,JUN 16 120 Call,0.10,-0.11(-52.40%),15,260,+15,170(+16855.60%),624,424,+480(+0.80%)
PSX,PSX170120P00065000,JAN 17 65 Put,2.10,+0.15(+7.70%),1,339,+1,331(+16637.50%),10,084,+4(+0.40%)
LLY,LLY160715P00072500,JUL 16 72.5 Put,1.47,+0.16(+12.20%),8,799,+8,745(+16194.40%),103,054,+18(+0.20%)
GDXJ,GDXJ170120C00045000,JAN 17 45 Call,3.40,+1.35(+65.90%),1,607,+1,597(+15970.00%),40,244,-10(-0.20%)
YHOO,YHOO170120P00028000,JAN 17 28 Put,0.86,+0.04(+4.90%),3,528,+3,505(+15239.10%),181,584,+15(+0.10%)
SO,SO160617C00050000,JUN 16 50 Call,0.65,+0.34(+109.70%),1,486,+1,476(+14760.00%),46,594,-8(-0.20%)
SH,SH160819C00021000,AUG 16 21 Call,0.30,0.000.00%,2,228,+2,213(+14753.30%),13,964,+15(+1.10%)
GS,GS160617P00145000,JUN 16 145 Put,0.32,+0.08(+33.30%),7,086,+7,038(+14662.50%),24,794,-43(-1.70%)
JD,JD160715P00021000,JUL 16 21 Put,0.50,+0.10(+25.00%),1,916,+1,903(+14638.50%),1,074,+9(+9.20%)
DAL,DAL160617C00042000,JUN 16 42 Call,0.73,-0.52(-41.60%),17,138,+17,021(+14547.90%),191,164,-80.00%

Upvotes: 1

Related Questions