Reputation: 23
I'm new to using beautiful soup
and I have been following tutorials on scraping with it. I am trying to use it to return high and low prices from common forex pairs. Im not sure if it is the sites that I'm trying to get the information rom, but I can find the div tag
that I want the info from, I believe the text is hidden in the span, but i am still having trouble with it coming back nonetype
.
Can anyone help me figure this out?
url : https://www.centralcharts.com/en/6748-aud-nzd/quotes
div class="tabMini-wrapper"
this is the whole table ^^ div class..
Is it because of the format the site has it in?
import requests
from bs4 import BeautifulSoup
import re
URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
spans=soup.find('span', attrs = {'class' , 'tabMini tabQuotes'})
print(spans)
I tried a bunch of different ways but this was most recent attempt. I was trying to get it from the span after the .find()
returned nonetype
for the table
Upvotes: 2
Views: 89
Reputation: 15629
Sorry for the slow reply I was away from my computer for the American holiday. Here is another way to accomplish your task. This one uses multiple list comprehensions and the zip to iterator over the data.
import requests
from bs4 import BeautifulSoup
URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = []
table_data = soup.select_one('h2:-soup-contains("5 days quotes")').find_next('table')
dates = [element.get('data-date') for element in table_data.find_all('span', attrs={'set-locale-date'})]
daily_high_quotes = [element.text for element in table_data.find('td', text='High').find_next_siblings('td')]
daily_low_quotes = [element.text for element in table_data.find('td', text='Low').find_next_siblings('td')]
for quote_date, daily_high, daily_low in zip(dates, daily_high_quotes, daily_low_quotes):
results.append([quote_date, daily_high, daily_low])
print(results)
Output:
[['2022-11-21', '1.0854', '1.0811'],
['2022-11-22', '1.0837', '1.0782'],
['2022-11-23', '1.0837', '1.0746'],
['2022-11-24', '1.0814', '1.0765'],
['2022-11-25', '1.0822', '1.0796']]
There are multiple tables on the pages, so you need to extract the data from the correct table. This code will help.
import requests
from bs4 import BeautifulSoup
URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
for tag_element in soup.find_all('table', attrs={'class', 'tabMini tabQuotes'}):
for item in tag_element.find_all('td'):
print(item)
Upvotes: 1
Reputation: 44073
Oke, Version 2..
It seems like OP want to capture the complete table, with date.
Since this is an HTML table, you'll need to make a custom loop that will map both the headers (th
) and rows (tr > td
)
steps the script takes:
Find the table
For each header, append the data-date
to the result
object
Find all the tr
's in the table
Index 5 is high, index 6 is low (this could be improved by searching for the text)
Ensure high
and low
have the same amount of items
Map the index of the row with the index of the header, append to result
(could probably be improved using something like zip()
import requests
from bs4 import BeautifulSoup
import re
result = []
URL = "https://www.centralcharts.com/en/6748-aud-nzd/quotes"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
table = soup.select('.prod-left-content .tabMini-wrapper table')[0]
for header in table.find('thead').find_all('span', attrs = { 'class': 'set-locale-date' }):
result.append({ 'date': header.get('data-date'), 'high': None, 'low': None })
trs = table.select('tbody > tr')
high = trs[5].find_all('span')
low = trs[6].find_all('span')
if (len(high) != len(low)):
print('Mmmm, somehting went wrong?')
exit()
for i in range(len(high)):
result[i]['low'] = low[i].text
result[i]['high'] = high[i].text
for o in result:
print(o['date'] + "\t\t" + o['high'] + "\t" + o['low'])
Gives:
2022-11-18 1.0915 1.0834
2022-11-21 1.0854 1.0811
2022-11-22 1.0837 1.0782
2022-11-23 1.0837 1.0746
2022-11-24 1.0812 1.0765
Upvotes: 2