Reputation: 13
Hacker News has released an API, how do I use it in Python?
I want get all the top posts. I tried using urllib
, but I don't think I am doing right.
here's my code:
import urllib2
response = urllib2.urlopen('https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
html = response.read()
print response.read()
It just prints empty
''
I missed a line, had updated my code.
Upvotes: 0
Views: 3633
Reputation: 9636
As @jonrsharpe, explained read()
is only one time operation. So if you print html
, you will get list of all ids. And if you go through that list, you have to make each request again to get story of each id.
First you have to convert the received data to python list and go through them all.
base_url = 'https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty'
top_story_ids = json.loads(html)
for story in top_story_ids:
response = urllib2.urlopen(base_url.format(story))
print response.read()
Instead of all this, you could use haxor, it's a Python wrapper for Hacker News API. Following code will fetch you all the ids of top stories :
from hackernews import HackerNews
hn = HackerNews()
top_story_ids = hn.top_stories()
# >>> top_story_ids
# [8432709, 8432616, 8433237, ...]
Then you can go through that loop and print all them, for example:
for story in top_story_ids:
print hn.get_item(story)
Disclaimer: I wrote haxor
.
Upvotes: 5
Reputation: 122137
You should
print html
instead of
print response.read()
Why? Because the read
is a one-time operation; after you've done it, you can't repeat it:
>>>import ullrib2
>>> response = urllib2.urlopen('https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
>>> response.read()
'[ 8445087, 8444739, 8444603, 8443981, 8444976, 8443902, 8444252, 8444634, 8444931, 8444272, 8444025, 8441939, 8444510, 8444640, 8443830, 8445076, 8443470, 8444785, 8443028, 8444077, 8444832, 8443841, 8443467, 8443309, 8443187, 8443896, 8444971, 8443360, 8444601, 8443287, 8441095, 8441681, 8441055, 8442712, 8444909, 8443621, 8442596, 8443836, 8442266, 8443298, 8445122, 8443096, 8441699, 8442119, 8442965, 8440486, 8442093, 8443393, 8442067, 8444989, 8440985, 8444622, 8438728, 8442555, 8444880, 8442004, 8443185, 8444370, 8436210, 8437671, 8439641, 8443727, 8441702, 8436309, 8441041, 8437367, 8422087, 8441711, 8438063, 8444212, 8439408, 8442049, 8440989, 8439367, 8438515, 8437403, 8435278, 8442486, 8442730, 8428522, 8438904, 8443450, 8432703, 8430412, 8422928, 8443635, 8439267, 8440191, 8439560, 8437230, 8442556, 8439977, 8444140, 8441682, 8443776, 8441209, 8428632, 8441388, 8422599, 8439547 ]\n'
>>> response.read()
''
In your case, though, you've assigned the string from read
to the name html
, so you can still access it.
Once you have the story IDs, you can access each one via '.../v0/item/{item number}.json?print=pretty'
:
>>> response = urllib2.urlopen('https://hacker-news.firebaseio.com/v0/item/8445087.json?print=pretty')
>>> print response.read()
{
"by" : "lalmachado",
"id" : 8445087,
"kids" : [ 8445205, 8445195, 8445173, 8445103 ],
"score" : 21,
"text" : "",
"time" : 1413116430,
"title" : "Show HN: Powerful ASCII art editor designed for the Mac",
"type" : "story",
"url" : "http://monodraw.helftone.com/"
}
You should read through the API documentation before continuing. It's also worth getting to grips with the json
module.
Upvotes: 1