Reputation: 511
I am trying to save a file (audio/mp3 in this case) to the App Engine blobstore, but with mixed success. Everything seems to work, a file is saved in the blobstore, of the right type, but it essentially empty (1.5kB vs. the expected 6.5kB) and so won't play. The URL in question is http://translate.google.com/translate_tts?ie=UTF-8&tl=en&q=revenues+in+new+york+were+56+million
The app engine logs do not show anything unusual - all parts are executing as expected... Any pointers would be appreciated!
class Dictation(webapp2.RequestHandler):
def post(self):
sentence = self.request.get('words')
# Google Translate API cannot handle strings > 100 characters
sentence = sentence[:100]
# Replace the non-alphanumeric characters
# The spaces in the sentence are replaced with the Plus symbol
sentence = urllib.urlencode({'q': sentence})
# Name of the MP3 file generated using the MD5 hash
mp3_file = hashlib.md5(sentence).hexdigest()
# Save the MP3 file in this folder with the .mp3 extension
mp3_file = mp3_file + ".mp3"
# Create the full URL
url = 'http://translate.google.com/translate_tts?ie=UTF-8&tl=en&' + sentence
# upload to blobstore
mp3_file = files.blobstore.create(mime_type = 'audio/mp3', _blobinfo_uploaded_filename = mp3_file)
mp3 = urllib.urlopen(url).read()
with files.open(mp3_file, 'a') as f:
f.write(mp3)
files.finalize(mp3_file)
blob_key = files.blobstore.get_blob_key(mp3_file)
logging.info('blob_key identified as %s', blob_key)
Upvotes: 0
Views: 805
Reputation: 366133
The problem has nothing to do with your code; it is correctly retrieving the data from the URL you gave.
For example, if I try this at the command line:
$ curl -O http://translate.google.com/translate_tts?ie=UTF-8&tl=en&q=revenues+in+new+york+were+56+million
I get a 1.5kB 403 error page, whose contents say:
403. That's an error.
Your client does not have permission to get URL /translate_tts?ie=UTF-8&tl=en&q=revenues+in+new+york+were+56+million from this server. (Client IP address: 1.2.3.4)
That’s all we know.
And your code does the exact same thing, whether run in GAE or directly in the interactive interpreter.
Most likely, the reason it works in your browser is that you do have permissions. So, what does that mean? It could mean that you have a valid SID cookie from google.com in your browser, but not your script. Or it could mean that your browser's user agent is recognized as something that can play HTML5 audio, but your script's isn't. Or…
Well, you can try to reverse-engineer what's different in the cookies, headers, etc. between your browser and your script, and narrow it down to the relevant difference, and use explicit headers or cookies or whatever you need to work around the problem.
But it will just break the next time Google changes anything.
And Google will probably not be happy with you if you try this. They offer a Google Translate API service that they want you to use, and they got rid of all of the free options for that API because of "substantial economic burden caused by extensive abuse." Trying to publish a Google App Engine web service that evades Google's API pricing by scraping their pages is probably not the kind of thing they enjoy their customers doing.
Upvotes: 2