Reputation: 33
I am trying to scrape a table from http://marine-transportation.capitallink.com/indices/baltic_exchange_history.html?ticker=BDI
Although it seemed fairly easy it is not possible for me to identify the table in such a way that i could scrape it, i'm not able to extract the data. Can any one help with the right identification
import urllib3
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
import requests
import csv
import re
url = 'http://marine-transportation.capitallink.com/indices/baltic_exchange_history.html?ticker=BDI'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
col = row.find_all('td')
column_1 = col[0].string.strip()
#
date = []
closing_rate = []
#Here i need a reference to the correct table
table = soup.find()
for row in table.find_all('tr')[1:]:
col = row.find_all('td')
column_1 = col[0].string.strip()
date.append(column_1)
column_2 = col[1].string.strip()
closing_rate.append(column_2)
columns = {'date': date, 'closing_rate': ClosingRate}
df = pd.DataFrame(columns)
df.to_csv('Baltic_Dry.csv')
Upvotes: 0
Views: 2437
Reputation: 204
You could use unique style attributes to identify the table you need.
For example, on this page here, it looks like the table containing index data is 550px wide. You can use:
soup.findAll('table', width="550")
Please note: I had to use another page on the same website because the one you posted requires a login. Hopefully the page structure will be similar.
Upvotes: 2