Reputation: 896
I'm wondering if anyone with a better understanding of python and gae can help me with this. I am uploading a csv file from a form to the gae datastore.
class CSVImport(webapp.RequestHandler):
def post(self):
csv_file = self.request.get('csv_import')
fileReader = csv.reader(csv_file)
for row in fileReader:
self.response.out.write(row)
I'm running into the same problem that someone else mentions here - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717
That is, the csv.reader is iterating over each character and not the line. A google engineer left this explanation:
The call self.request.get('csv') returns a String. When you iterate over a string, you iterate over the characters, not the lines. You can see the difference here:
class ProcessUpload(webapp.RequestHandler):
def post(self):
self.response.out.write(self.request.get('csv'))
file = open(os.path.join(os.path.dirname(__file__), 'sample.csv'))
self.response.out.write(file)
# Iterating over a file
fileReader = csv.reader(file)
for row in fileReader:
self.response.out.write(row)
# Iterating over a string
fileReader = csv.reader(self.request.get('csv'))
for row in fileReader:
self.response.out.write(row)
I really don't follow the explanation, and was unsuccessful implementing it. Can anyone provide a clearer explanation of this and a proposed fix?
Thanks, August
Upvotes: 7
Views: 8291
Reputation: 6610
You need to call csv_file = self.request.POST.get("csv_import")
and not csv_file = self.request.get("csv_import")
.
The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get
gives you a cgi.FieldStorage object.
This means that you can call csv_file.filename
to get the object’s filename and csv_file.type
to get the mimetype.
Furthermore, if you access csv_file.file
, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.
Therefore, your code can simply be:
class CSVImport(webapp.RequestHandler):
def post(self):
csv_file = self.request.POST.get('csv_import')
fileReader = csv.reader(csv_file.file)
for row in fileReader:
# row is now a list containing all the column data in that row
self.response.out.write(row)
Upvotes: 0
Reputation: 12838
Short answer, try this:
fileReader = csv.reader(csv_file.split("\n"))
Long answer, consider the following:
for thing in stuff:
print thing.strip().split(",")
If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.
Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.
Upvotes: 13
Reputation: 41257
I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.
The Python csv
module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.
The webapp.RequestHandler
request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two
. When you invoke self.request.get('csv')
this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv
module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c
will print each character in the string on a separate line).
Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:
class ProcessUpload(webapp.RequestHandler):
def post(self):
self.response.out.write(self.request.get('csv'))
# Iterating over a string as a file
stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
for row in stringReader:
self.response.out.write(row)
Which will work as you expect it to.
Edit I'm assuming that you are using something like a <textarea/>
to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).
Upvotes: 8