Reputation: 398
I cannot extract data from html tables (tag tbody). I would very much like to be proved wrong here..
Here is my code:
import lxml.html as LH
import requests
import pandas as pd
from datetime import datetime
start_time = datetime.now()
def text(elt):
return elt.text_content().replace(u'\xa0', u' ')
try:
url = 'https://www.byma.com.ar/acciones/panel/general'
r = requests.get(url)
except requests.exceptions.Timeout as e:
print e
sys.exit(1)
except requests.exceptions.TooManyRedirects as e:
print e
sys.exit(1)
except requests.exceptions.RequestException as e:
print e
sys.exit(1)
root = LH.fromstring(r.content)
for table in root.xpath('//*[@id="dataStocks"]'):
header = [text(th) for th in table.xpath('//*[@id="dataStocks"]/thead')]
data = [[text(td) for td in tr.xpath('//*[@id="dataStocks"]/tbody/tr')]
for tr in table.xpath('//tr')]
data = [row for row in data if len(row)==len(header)]
data = pd.DataFrame(data, columns=header)
print(data)
Only have head columns :S
Upvotes: 0
Views: 160
Reputation: 52665
The values that you want to get is dynamic data that absent in initial page source, but received from XHR. You can get those values as below:
import requests
import json
url = "https://www.byma.com.ar/wp-admin/admin-ajax.php?action=get_panel&panel_id=2"
response = requests.get(url)
data = response.json()
for entry in data["Cotizaciones"]:
print(entry)
The output of each entry
is something like
{'Apertura': 8.5, 'Cantidad_Nominal_Compra': 17346,
'Cantidad_Nominal_Venta': 21
569, 'Cantidad_Operaciones': '2409', 'Cierre_Anterior': 8.65, 'Denominacion': 'G
RUPO FINANCIERO VALORES SOCIEDAD ANONIMA', 'Estado': '', 'Ex': 'No', 'Hora_Cotiz
acion': '17:05:53', 'Maximo': 8.54, 'Minimo': 7.95, 'Monto_Operado_Pesos': 89376
607, 'Precio_Compra': 7.95, 'Precio_Promedio': 8.21, 'Precio_Promedio_Ponderado'
: 8.1886, 'Precio_Venta': 7.96, 'Simbolo': 'VALO', 'Tendencia': 0, 'Tipo_Liquida
cion': 'Pesos', 'Ultimo': 7.96, 'Variacion': -7.98, 'Vencimiento': '48hs', 'Volu
men_Nominal': 10896556}
You can also get each value from entry
separately, e.g.
print(entry['Apertura'])
Output:
8.5
Upvotes: 1