Dwyte
Dwyte

Reputation: 451

Downloading the HTML page of a site and crawl it to get the desired data cause they dont have a public api

So I need to get some data from a site, the problem is they dont have a public api for it so I thought of downloading the html file then search for the data I want. I just not sure if its even possible to do it I think it should be right?

the flow would be
1. first download the html file
2. ....crawl (https://www.forexfactory.com/calendar.php) the link that has the data I want

not sure how will I crawl the page as string,cause the page has like a table, of data they actualy have a public api for the xml file, but it excludes the data I want which is the "actual" column, that's what I want

how will I crawl the table and get that actual column from the html file, I already have the other details from their xml file, like title/event name. Need help thanks.

Upvotes: 0

Views: 66

Answers (1)

MLAlex
MLAlex

Reputation: 142

A good idea is to work with Pythons request and BeautifulSoup4 libs.

First you make a http request with (you guessed it) requests, then you can parse the html site with bs4 (BeautifulSoup4)

import requests
from bs4 import BeautifulSoup
r = requests.get("Your Website").text
soup = BeautifulSoup(r,'lxml')

Now you can look at your "soup" and scrape the data you want

Upvotes: 1

Related Questions