aclark904
aclark904

Reputation: 23

beautiful soup to grab forex prices

I'm new to using beautiful soup and I have been following tutorials on scraping with it. I am trying to use it to return high and low prices from common forex pairs. Im not sure if it is the sites that I'm trying to get the information rom, but I can find the div tag that I want the info from, I believe the text is hidden in the span, but i am still having trouble with it coming back nonetype.

Can anyone help me figure this out?

url : https://www.centralcharts.com/en/6748-aud-nzd/quotes

div class="tabMini-wrapper"

this is the whole table ^^ div class..

Is it because of the format the site has it in?

import requests
from bs4 import BeautifulSoup
import re

URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

spans=soup.find('span', attrs = {'class' , 'tabMini tabQuotes'})
print(spans)

I tried a bunch of different ways but this was most recent attempt. I was trying to get it from the span after the .find() returned nonetype for the table

Upvotes: 2

Views: 89

Answers (2)

Life is complex
Life is complex

Reputation: 15629

UPDATED ANSWER 11-25-2022

Sorry for the slow reply I was away from my computer for the American holiday. Here is another way to accomplish your task. This one uses multiple list comprehensions and the zip to iterator over the data.

import requests
from bs4 import BeautifulSoup

URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")


results = []
table_data = soup.select_one('h2:-soup-contains("5 days quotes")').find_next('table')
dates = [element.get('data-date') for element in table_data.find_all('span', attrs={'set-locale-date'})]
daily_high_quotes = [element.text for element in table_data.find('td', text='High').find_next_siblings('td')]
daily_low_quotes = [element.text for element in table_data.find('td', text='Low').find_next_siblings('td')]
for quote_date, daily_high, daily_low in zip(dates, daily_high_quotes, daily_low_quotes):
    results.append([quote_date, daily_high, daily_low])
print(results)

Output:

[['2022-11-21', '1.0854', '1.0811'], 
['2022-11-22', '1.0837', '1.0782'], 
['2022-11-23', '1.0837', '1.0746'], 
['2022-11-24', '1.0814', '1.0765'], 
['2022-11-25', '1.0822', '1.0796']]

ORIGINAL ANSWER 11-24-2022

There are multiple tables on the pages, so you need to extract the data from the correct table. This code will help.

import requests
from bs4 import BeautifulSoup

URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

for tag_element in soup.find_all('table', attrs={'class', 'tabMini tabQuotes'}):
    for item in tag_element.find_all('td'):
        print(item)

Upvotes: 1

0stone0
0stone0

Reputation: 44073

Oke, Version 2..


It seems like OP want to capture the complete table, with date.

Since this is an HTML table, you'll need to make a custom loop that will map both the headers (th) and rows (tr > td)


steps the script takes:

  1. Find the table

  2. For each header, append the data-date to the result object

  3. Find all the tr's in the table

  4. Index 5 is high, index 6 is low (this could be improved by searching for the text)

  5. Ensure high and low have the same amount of items

  6. Map the index of the row with the index of the header, append to result (could probably be improved using something like zip()

import requests
from bs4 import BeautifulSoup
import re

result = []

URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

table = soup.select('.prod-left-content .tabMini-wrapper table')[0]

for header in table.find('thead').find_all('span', attrs = { 'class': 'set-locale-date' }):
    result.append({ 'date': header.get('data-date'), 'high': None, 'low': None })

trs = table.select('tbody > tr')

high = trs[5].find_all('span')
low  = trs[6].find_all('span')

if (len(high) != len(low)):
    print('Mmmm, somehting went wrong?')
    exit()

for i in range(len(high)):
    result[i]['low'] = low[i].text
    result[i]['high'] = high[i].text

for o in result:
    print(o['date'] + "\t\t" + o['high'] + "\t" + o['low'])

Gives:

2022-11-18      1.0915  1.0834
2022-11-21      1.0854  1.0811
2022-11-22      1.0837  1.0782
2022-11-23      1.0837  1.0746
2022-11-24      1.0812  1.0765

Upvotes: 2

Related Questions