Reputation: 2569
I am very new to Python and GAE but I am attempting to download an XML file from the eventful.com api (in XML), parsing it and I will then storing this information within a database on Google Cloud SQL.
My code so far is as follows which I have managed to write after looking at various online tutorials however I keep receiving many errors and the code will not work for me at all. If anyone has any pointers on where I am going wrong please let me know, Karen.
My Attempt to call the eventful xml file and parse it:
import webapp2
from google.appengine.ext.webapp import template
import os
import datetime
from google.appengine.ext import db
from google.appengine.api import urlfetch
import urllib #import python library which does http requests
from xml.dom import parseString #imports xml parser called minidom
class XMLParser(webapp2.RequestHandler):
def get(self):
base_url = fetch('http://api.eventful.com/rest/events/search?app_key=zGtDX6cwQ=dublin&?q=music')
#downloads data from xml file
response = urllib.urlopen(base_url)
#converts data to string:
data = response.read()
#closes file
response.close()
#parses xml downloaded
dom = parseString(data)
#retrieves the first xml tag that the parser finds with name tag
xmlTag = dom.getElementsByTagName('title')[0].toxml()
#strip off the tag to just reveal event name
xmlData = xmlTag.replace('<title>', '').replace('</title>', '')
#print out the xml tag and data in this format:
print xmlTag
#just print the data
print xmlData
I receive the following errors when I try to run this code however in Google App Engine user the GAE launcher -
2013-04-15 16:52:05 Running command: "['C:\\Python27\\python.exe', 'C:\\Program Files (x86)\\Google\\google_appengine\\dev_appserver.py', '--skip_sdk_update_check=yes', '--port=8080', '--admin_port=8002', u'C:\\Users\\Karen\\Desktop\\Development\\own_tada']"
INFO 2013-04-15 16:52:17,944 devappserver2.py:498] Skipping SDK update check.
WARNING 2013-04-15 16:52:18,005 api_server.py:328] Could not initialize images API; you are likely missing the Python "PIL" module.
INFO 2013-04-15 16:52:18,065 api_server.py:152] Starting API server at: http://localhost:54619
INFO 2013-04-15 16:52:18,085 dispatcher.py:150] Starting server "default" running at: http://localhost:8080
INFO 2013-04-15 16:52:18,095 admin_server.py:117] Starting admin server at: http://localhost:8002
ERROR 2013-04-15 15:52:35,767 wsgi.py:219]
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 196, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 255, in _LoadHandler
handler = __import__(path[0])
File "C:\Users\Karen\Desktop\Development\own_tada\own.py", line 8, in <module>
from xml.dom import parseString #imports xml parser called minidom
ImportError: cannot import name parseString
INFO 2013-04-15 16:52:35,822 server.py:561] default: "GET / HTTP/1.1" 500 -
ERROR 2013-04-15 15:52:37,586 wsgi.py:219]
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 196, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 255, in _LoadHandler
handler = __import__(path[0])
File "C:\Users\Karen\Desktop\Development\own_tada\own.py", line 8, in <module>
from xml.dom import parseString #imports xml parser called minidom
ImportError: cannot import name parseString
INFO 2013-04-15 16:52:37,617 server.py:561] default: "GET /favicon.ico HTTP/1.1" 500 -
One such tutorial I used for the above code comes from the following URL: http://www.travisglines.com/web-coding/python-xml-parser-tutorial
EDIT:
Thanks to the help provided by Josh below I now am not receiving any errors when I launch my code with my code, however I only see a blank screen and want it to print out the parsed information (or its progress this far). I know this may seem like a very stupid question but I really am a beginner so I'm sorry! Fixed code (minus errors) is :
import webapp2
from google.appengine.ext.webapp import template
import os
import datetime
from google.appengine.ext import db
from google.appengine.api import urlfetch
import urllib #import python library which does http requests
import xml.dom.minidom as mdom #imports xml parser called minidom
class XMLParser(webapp2.RequestHandler):
def get(self):
base_url = 'http://api.eventful.com/rest/events/search?app_key=zGtDX6cwQjCRdkf6&l=dublin&?q=music'
#downloads data from xml file
response = urllib.urlopen(base_url)
#converts data to string:
data = response.read()
#closes file
response.close()
#parses xml downloaded
dom = mdom.parseString(data)
#retrieves the first xml tag that the parser finds with name tag
xmlTag = dom.getElementsByTagName('title')[0].toxml()
#strip off the tag to just reveal event name
xmlData = xmlTag.replace('<title>', '').replace('</title>', '')
#print out the xml tag and data in this format:
print xmlTag
#just print the data
print xmlData
app = webapp2.WSGIApplication([('/', XMLParser),
],
debug=True)
Any guidelines on what to do next would be greatly appreciated or anything on what you can spot is wrong with my python code, Thank you!
Upvotes: 0
Views: 1970
Reputation: 8221
Appengine supports lxml it is very simple to include it and parse your document with it.
In your app.yaml file
libraries:
- name: lxml
- version: latest
and then import lxml
and follow the parsing instructions
Upvotes: 2
Reputation: 118
This should fix your imports
import xml.dom.minidom as mdom
and parse the raw with:
dom = mdom.parseString(data)
As far as the data manipulation you are going to want to look into the childnodes and data elements returned from parseString.
Such as:
for element in dom.getElementsByTagName('title')[0].childnodes:
print element.data
To see the structure once its been parsed.
Upvotes: 1