Reputation: 451
So I need to get some data from a site, the problem is they dont have a public api for it so I thought of downloading the html file then search for the data I want. I just not sure if its even possible to do it I think it should be right?
the flow would be
1. first download the html file
2. ....crawl
(https://www.forexfactory.com/calendar.php) the link that has the data I want
not sure how will I crawl the page as string,cause the page has like a table, of data they actualy have a public api for the xml file, but it excludes the data I want which is the "actual" column, that's what I want
how will I crawl the table and get that actual column from the html file, I already have the other details from their xml file, like title/event name. Need help thanks.
Upvotes: 0
Views: 66
Reputation: 142
A good idea is to work with Pythons request and BeautifulSoup4 libs.
First you make a http request with (you guessed it) requests, then you can parse the html site with bs4 (BeautifulSoup4)
import requests
from bs4 import BeautifulSoup
r = requests.get("Your Website").text
soup = BeautifulSoup(r,'lxml')
Now you can look at your "soup" and scrape the data you want
Upvotes: 1