How to export the contents of a blog to JSON?

Question

I am working on a blog and learning web development at the same time. I want to learn more about JSON so I am trying to implement a way to export the entire contents of my blog to JSON and later XML. I am hitting a lot of problems on the way, the biggest one being getting the url of the page which I want to render as JSON/XML dynamically. The code for my website can be found here. I still need to comment more and I have to implement a lot of functionalities. The main class which is responsible for exporting the contents to JSON is as follows :

class JSONHandler(BaseHandler):
#TODO: get a way to gt the url from the request
def get(self):
    self.response.headers['Content-Type'] = 'application/json'
    url = "http://www.bigb-myapp.appspot.com/blog" 
    #url = self.request.path_url
    logging.info(url)
    page = urllib2.urlopen(url).read()
    soup = BeautifulSoup(page)
    subject_list = []
    day_list = []
    content_list = []

    subjects = soup.findAll('div', {'class' : 'subject-title'})
    days = soup.findAll('div', {'class' : 'day'})
    contents = soup.findAll('div', {'class' : 'post'})

    for subject in subjects:
        subject_list.append(subject.findAll(text = True))

    for day in days:
        day_list.append(day.findAll(text = True))

    for content in contents:
        content_list.append(content.findAll(text = True))

    i = 0

    for s, d, c in subject_list, day_list, content_list:
        json_text = json.dumps({'subject': s[i][i],'day': d[i][i], 'content': c[i][i]})
        i += 1

    self.write(json_text)

I am also sure that the printing function is erroneous, but that is the easy part. As I said getting the url is proving to be a major difficulty.

I have tried to get the url form the environment variable and I also have tired webapp2's request handlers such as self.request.path_url to no avail.

I am working with Google App engine and use the jinja2 template engine.

Thanks.

Ofir Israel · Accepted Answer

self.request.url or self.request.path should do the trick. However, the better way to do this is using similar to what you used in the permalink section. Just parse the post-id from the request. Meaning you should separate JSONHandler into handling two things - a) return the entire blog, b) return an individual post.

I'd also suggest to not use this method you're using to get the blog posts... In the Mainpage class you do it so elegantly with GQL so why do it with urllib2 and BeautifulSoup ?

And as for the last question about the response.. the correct way is: self.response.out.write("something")

EDITED TO ADD:
I meant to split the JSONHandler into two parts, such that there'd be two handlers: ('/blog/(\d+).json',PermalinkJSONHandler), ('/blog.json',FullJSONHandler),...

Both should be about the same (even using the same function for dumping the json) just with different GQLs to get the correct information.

How to export the contents of a blog to JSON?

Answers (1)

Related Questions