user1938357
user1938357

Reputation: 1466

set request/response encoding to UTF-8 to store text in app engine data store

In my android app, I am saving some data in the local file system and retrieving the same to display back in an android activity. I am using UTF-8 format to store the text hence I am able to save and display files in multiple languages. This works fine. My app is also connected to google app engine and I am using the google app engine data store to store some data. In some cases I am saving my text to app engine datastore and retrieving them back. In the process I am noticing that there is no way to specify UTF-8 encoding for my test while saving. Is there any way to ensure the UTF-8 formatting is used in the process of saving and retrieving the text from app engine data store?

Added sample code below for datastore inserrt operation

    public class EndpointsInsertUpdateQuizContentTask extends AsyncTask<Context, Integer, Long>{
    protected Long doInBackground(Context... contexts){
        Quizcontenttableendpoint.Builder endpointBuilder = new Quizcontenttableendpoint.Builder(
        AndroidHttp.newCompatibleTransport(), new JacksonFactory(), new HttpRequestInitializer() {
        public void initialize(HttpRequest httpRequest) { } });
        Quizcontenttableendpoint endpoint = CloudEndpointUtils.updateBuilder(endpointBuilder).build();
        try{
            //get local file content into a string
            int ch;
            StringBuffer fileContent = new StringBuffer("");
            FileInputStream fis;
            //String quizContentString;
            fis = getBaseContext().openFileInput(selectedQuiz);
            while( (ch = fis.read()) != -1)
                fileContent.append((char)ch);
            String quizContentString = new String(fileContent);

                QuizContentTable quizContentTable = new QuizContentTable();
                quizContentTable.setQuizKey(quizKey);
                quizContentTable.setQuizContent(quizContentString);

                quizContentResult = endpoint.insertQuizContentTable(quizContentTable).execute();
        }   
        catch(Exception e){
            errMsg=e.toString();}
        return (long) 0;
    }
    private ProgressDialog pdia;
    @Override
    protected void onPreExecute(){ 
        super.onPreExecute();
        pdia = new ProgressDialog(ctx);
        pdia.setMessage("Loading");
        pdia.show();    
    }
    protected void onPostExecute(Long result1) {
        pdia.dismiss();

}

I searched for similar issues in other stackoverflow queries, and some of them hinted towards setting the UTF-8 encoding during the request and response. I am not sure where to set that in my android app. In the appengine code I can only specify/edit the entities and corresponding fields and their data types. I then generate the corresponding end point library in my project. Where do I set the encoding for request/response for the app engine?

Upvotes: 0

Views: 1422

Answers (2)

user1938357
user1938357

Reputation: 1466

Resolved the issue. The problem was that I had missed out UTF encoding while reading the file content into the string. Replaced the filereading part in my code above with the below code

          String str;
            StringBuffer fileContent = new StringBuffer("");
            BufferedReader in = new BufferedReader(new InputStreamReader(getBaseContext().openFileInput(selectedQuiz), "UTF-8"));
            while ((str = in.readLine()) != null)
                fileContent.append(str);
            String quizContentString = new String(fileContent);
            in.close();

Upvotes: 0

Alex Martelli
Alex Martelli

Reputation: 881675

Text strings are stored in the App Engine datastore as unicode strings, so no byte-encoding is applicable.

See, for example, https://cloud.google.com/appengine/docs/java/datastore/entities#Java_Properties_and_value_types -- a short text string (up to 500 unicode characters) maps to a java.lang.String (thus, unicode) and can be indexed; a long one (can't be indexed) maps to a https://cloud.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/Text and is also stored as unicode.

However, while the datastore supports Unicode, and thus text, directly, HTTP demands the use of bytes, and thus requires proper encoding to send, and decoding back on receipt (whence the need for the charset= part in the Content-Type header). Specifically,in HTTP (both requests and responses), the encoding is specified as a part of the Content-Type header: e.g,

Content-Type: text/plain; charset=utf-8

In a servlet, such as used in the App Engine Java runtime, you set that header by calling the setContentType and setCharacterEncoding methods of the ServletResponse -- see e.g http://docs.oracle.com/javaee/5/api/javax/servlet/ServletResponse.html#setCharacterEncoding(java.lang.String) .

As the OP clarifies in a comment that:

non-English text to be stored in that [[Text]] field but while displaying it back, the non-English text is lost and is displayed in some weird way [[ as viewed in the App Engine console ]]

the likely cause of the problem is that the request sent to App Engine is not being properly serialized as UTF8 bytes and with the Content-Type as specified above.

Of course, the header only applies to the request's body. To send Unicode text as part of the query string (e.g, in an HTTP GET), the further %HH encoding of each (UTF-8 encoded) byte is needed. So, for example, to send значение -- which in UTF-8 becomes \xd0\xb7\xd0\xbd\xd0\xb0\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5 -- you'd visit e.g

http://appid.appspot.com/x?y=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5

Upvotes: 2

Related Questions