Reputation: 27
I want to web scrape a particular web of finances. But in my entire life I do that. I don't understand HTML, so it's very difficult for me. I want to learn because I need to have an example to start to web scraping a lot of tables. The web is of a institution of Chile, named "Comisión para el Mercado financiero". The url is: "http://www.cmfchile.cl/institucional/inc/valores_cuota/valor_serie.php?v1=C1KB5&v2=LPKA0ISQAKEHITB64IBM&v3=4ABCIV864AJ35MN64IBM&v4=V864A4ABCI&v5=J35MNS8IYM&v6=4ABCIV864A4ABCIV864A&v7=V864AISQAK&v8=V864A64IBM&v9=37G70LN68AGLD87IEAIXGLD87OL18863409LN68AOL188JKT99QHFLBMLXL410163LN68A&v10=21QYE48BCX99KWAEF88BWM6YB&v11=63409LN68AGLD8737GH0J35MN&v12=63409LN68AGLD8737GH04ABCI"
Can someone tell me how to do that? I know that I can do with BeautifulSoup and requests modules, but nothing more. And a book on web scraping in Python would be very helpful if there is one.
Upvotes: 0
Views: 192
Reputation: 618
Fully working code. You have to wait a lot (approx 10 minutes or less) to get your results. So please, reply after it is done
As your link has dynamic data, it takes time to load the data. Beautiful soup has some timeout due to which it says no response. The good idea will be to use selenium here as it waits until the page is fully loaded.
After trying hundreds of ways to get the data, here's the final solution.
from selenium import webdriver
rows = []
driver= webdriver.Chrome('P:/selenium/driver/chromedriver.exe')
driver.get('YOUR LINK HERE')
data = driver.find_elements_by_xpath('//*[@id="main"]/div/div[2]/table/tbody/tr')
for i in range(len(data)):
if(i==1): # Because 1st row contains irrelevant data
pass
else:
rows.append(data[i].text.split(" "))
print(rows)
Upvotes: 0
Reputation: 11525
import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0'
}
def main(url):
r = requests.get(url, headers=headers)
df = pd.read_html(r.content)[0]
print(df)
main("http://www.cmfchile.cl/institucional/inc/valores_cuota/valor_serie.php?v1=C1KB5&v2=LPKA0ISQAKEHITB64IBM&v3=4ABCIV864AJ35MN64IBM&v4=V864A4ABCI&v5=J35MNS8IYM&v6=4ABCIV864A4ABCIV864A&v7=V864AISQAK&v8=V864A64IBM&v9=37G70LN68AGLD87IEAIXGLD87OL18863409LN68AOL188JKT99QHFLBMLXL410163LN68A&v10=21QYE48BCX99KWAEF88BWM6YB&v11=63409LN68AGLD8737GH0J35MN&v12=63409LN68AGLD8737GH04ABCI")
Upvotes: 0
Reputation: 140
As you have mentioned it rightly this is "Web Scraping" and python has amazing modules for the same. It is important for us to understand the technicalities before we proceed further.
One of the most used module is -> BeautifulSoup
So, to get the info from any webpage,
Solution -
there are many ways, like
once you reach this point, by now it must be clear for you on the way we are gonna proceed further on
#make a request to the webpage, and grab the html respone
page = requests.get("your url here").content
#pass it on to beautifulsoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
#Depending on how you want to find, you can use findbyclass, findbytag, and #other methods
soup.findAll('your tag')
Upvotes: 1