August Flanagan
August Flanagan

Reputation: 896

Upload and parse csv file with google app engine

I'm wondering if anyone with a better understanding of python and gae can help me with this. I am uploading a csv file from a form to the gae datastore.

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.get('csv_import')
     fileReader = csv.reader(csv_file)
     for row in fileReader:       
       self.response.out.write(row) 

I'm running into the same problem that someone else mentions here - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717

That is, the csv.reader is iterating over each character and not the line. A google engineer left this explanation:

The call self.request.get('csv') returns a String. When you iterate over a string, you iterate over the characters, not the lines. You can see the difference here:

 class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 
     file = open(os.path.join(os.path.dirname(__file__), 'sample.csv')) 
     self.response.out.write(file) 

     # Iterating over a file 
     fileReader = csv.reader(file) 
     for row in fileReader: 
       self.response.out.write(row) 

     # Iterating over a string 
     fileReader = csv.reader(self.request.get('csv')) 
     for row in fileReader: 
       self.response.out.write(row) 

I really don't follow the explanation, and was unsuccessful implementing it. Can anyone provide a clearer explanation of this and a proposed fix?

Thanks, August

Upvotes: 7

Views: 8291

Answers (3)

Sam
Sam

Reputation: 6610

You need to call csv_file = self.request.POST.get("csv_import") and not csv_file = self.request.get("csv_import").

The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get gives you a cgi.FieldStorage object.

This means that you can call csv_file.filename to get the object’s filename and csv_file.type to get the mimetype. Furthermore, if you access csv_file.file, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.

Therefore, your code can simply be:

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.POST.get('csv_import')
     fileReader = csv.reader(csv_file.file)
     for row in fileReader:
       # row is now a list containing all the column data in that row
       self.response.out.write(row)

Upvotes: 0

Drew Sears
Drew Sears

Reputation: 12838

Short answer, try this:

fileReader = csv.reader(csv_file.split("\n"))

Long answer, consider the following:

for thing in stuff:
  print thing.strip().split(",")

If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.

Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.

Upvotes: 13

ig0774
ig0774

Reputation: 41257

I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.

The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.

The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).

Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:

class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 

     # Iterating over a string as a file 
     stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
     for row in stringReader: 
        self.response.out.write(row) 

Which will work as you expect it to.

Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).

Upvotes: 8

Related Questions