webscraping with beautiful soup

Question

I'm trying to scrape the table from a website that contains the Current interests rates. I used python with beautiful soup but I can't locate the html parts. Please send help ! thank you.

I only need to scrape the current interest rates table, not everything else and convert it into csv file. Here is the link to my webiste: https://www.global-rates.com/en/interest-rates/libor/american-dollar/usd-libor-interest-rate-12-months.aspx here is the picture of the current interests rate table:

I tried something like this:

import bs4 
import requests
from bs4 import BeautifulSoup
import pandas as pd 

URL = 'https://www.global-rates.com/en/interest-rates/libor/american-dollar/usd-libor-interest-rate-12-months.aspx' 

response = requests.get(URL)
soup=bs4.BeautifulSoup(response.content, 'html.parser')

print(soup.title)
print(soup.title.string)
print(len(response.text))

table = soup.find('table', attrs = {'class':'tableheader'}).tbody
print(table)

columns = ['Current interest rates']
df = pd.DataFrame(columns = columns)

trs = table.find_all('tr')
for tr in trs:
    tds = tr.find_all('td')
    row = [td.text.replace('
', '') for td in tds]
    df = df.append(pd.Series(row, index = columns), ignore_index = True)
df.to_csv('libor.csv', index = False)

but this gave me attribute errors: "None Type' object has no attribute 'tbody'

oh I also want to make automatically scraping the Mondays' interests rate if that's possible. Thank you for your help

Adam Williamson · Accepted Answer

Here is my attempt with just pandas

import pandas as pd

# Get all tables on page
dfs = pd.read_html('https://www.global-rates.com/en/interest-rates/libor/american-dollar/usd-libor-interest-rate-12-months.aspx')

# Find the Current interest rates table
df = [df for df in dfs if df.iloc[0][0] == 'Current interest rates'][0]

# Remove first row that contains column names
df = df.iloc[1:].copy()

# Set column names
df.columns = ['DATE','INTEREST_RATE']

# Convert date from november 02 2020 to 2020-11-02
df['DATE'] = pd.to_datetime(df['DATE'])

# Remove percentage sign from interest rate
df['INTEREST_RATE'] = df['INTEREST_RATE'].str.replace('%','').str.strip()

# Convert percentage to float type
df['INTEREST_RATE'] = df['INTEREST_RATE'].astype(float)

# Add day of the week column
df['DAY'] = df['DATE'].dt.day_name()

# Output all to CSV
df.to_csv('all_data.csv', index=False)

# Only Mondays
df_monday = df[df['DAY'] == 'Monday']

# Output only Mondays
df_monday.to_csv('monday_data.csv', index=False)

# Add day number of week (Monday = 0)
df['DAY_OF_WEEK_NUMBER'] = df['DATE'].dt.dayofweek

# Add week number of year
df['WEEK_OF_YEAR_NUMBER'] = df['DATE'].dt.weekofyear

# 1. Sort by week of year then day of week
# 2. Group by week of year
# 3. Select first record in group, which will be the earliest day available of that week
df_first_day_of_week = df.sort_values(['WEEK_OF_YEAR_NUMBER','DAY_OF_WEEK_NUMBER']).groupby('WEEK_OF_YEAR_NUMBER').first()

# # Output earliest day of the week data
df_first_day_of_week.to_csv('first_day_of_week.csv', index=False)

webscraping with beautiful soup

Answers (2)

Related Questions