Babababa31
Babababa31

Reputation: 21

Download content of Webpage

I need to download the content of a web page using Python.
What I need is the TLE of a specific satellite from Space-Track.org website.
An example of the url I need to scrape is the following:

https://www.space-track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show

Below the unsuccesful code I wrote/copied:

import requests

url = 'https://www.space- 
track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show'
res = requests.post(url)
html_page = res.content

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)
print(text)

res.post(url) returns Response [204] and I can't access the content of the webpage.
Could this happen because of the required login?
I must admit that I am not experienced with Python and I don't have the knowledge to this myself.
What I can do is to manipulate text files and from the DevTools page I can get the HTML file and extrapolate the text, but how can I do this programmatically?

Upvotes: 0

Views: 161

Answers (2)

x52
x52

Reputation: 11

I don't have access to this API, so take my advice with a grain of salt, but you should also try using requests.get instead of requests.post.

Why? Because requests.post POSTs data to the server, while requests.get GETs data from the server. GET and POST are known as HTTP methods, and to learn more about them, see https://www.tutorialspoint.com/http/http_methods.htm. Since web browsers use GET, you should give that a try.

Upvotes: 0

KKU
KKU

Reputation: 29

To access the url you mentioned , you need USERNAME and PASSWORD Authorization.

to do this( customize to your need):

import mechanize
from bs4 import BeautifulSoup
import urllib2 
import cookielib ## http.cookiejar in python3

cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.open("https://id.arduino.cc/auth/login/")

br.select_form(nr=0)
br.form['username'] = 'username'
br.form['password'] = 'password.'
br.submit()

print br.response().read()

Upvotes: 1

Related Questions