Marian Pavel
Marian Pavel

Reputation: 2876

How to parse very large json array with Moshi in Android?

I have a very large json file containing words from a specific language, from the dictionary. This file has more than 348 000+ words. Each object has different properties.

Here is an example of the json array:

[
...
{"id":"57414","form":"t'est","formNoAccent":"test","formUtf8General":"test","reverse":"tset","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.98","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"1","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245287"},
{"id":"57415","form":"ț'est","formNoAccent":"țest","formUtf8General":"țest","reverse":"tseț","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.93","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"24","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245213"},
...
]

I want to add these entries inside Room and have them persist there. The problem I am facing right now is that I haven't done anything similar to this and I am getting out of memory when I try to transform everything into a list of objects using Moshi.

The solution would be to load every item separately but I don't think it will be possible.

Until now, it looks like this:

        val archive = context.assets.open("table_lexeme.zip")
        val destination = File.createTempFile("table_lexeme", ".zip")
        val jsonFile = File.createTempFile("lexeme", ".json")

        archive.use {
            destination.writeBytes(it.readBytes())
        }

        ZipFile(destination).use { zip ->
            zip.entries().asSequence().forEach { zipEntry ->
                if (zipEntry.name == "dex_table_lexeme.json") {
                    zip.getInputStream(zipEntry).use { inputStream ->
                        val bos = BufferedOutputStream(FileOutputStream(jsonFile))
                        val bytesIn = ByteArray(BUFFER_SIZE)
                        var read: Int
                        while (inputStream.read(bytesIn).also { read = it } != -1) {
                            bos.write(bytesIn, 0, read)
                        }
                        bos.close()
                    }
                }
            }
        }

        val jsonReader = JsonReader(InputStreamReader(jsonFile.inputStream(), Charsets.UTF_8))
        jsonReader.beginArray()

Upvotes: 0

Views: 542

Answers (1)

CommonsWare
CommonsWare

Reputation: 1006944

The literal solution would be to switch to a streaming JSON parser, so your whole data set is not loaded into RAM at once. JsonReader in the Android SDK works this way, and Gson has a streaming mode. I do not recall Moshi offering this, but I haven't looked for that recently.

The realistic solution is to not package JSON. Importing those will be slow, even if you use transaction batches (e.g., insert 100 rows in a batch). You are packaging your data as an asset, so you will be better off (IMHO) generating the SQLite database on your development machine and packaging it. Room has built-in support to copy a packaged database from an asset and put it into position for use. While your database file is going to be large, it will be faster to copy it than to create it on the fly with imported data.

Upvotes: 1

Related Questions