Reputation: 2876
I have a very large json
file containing words from a specific language, from the dictionary. This file has more than 348 000+ words. Each object has different properties.
Here is an example of the json
array:
[
...
{"id":"57414","form":"t'est","formNoAccent":"test","formUtf8General":"test","reverse":"tset","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.98","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"1","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245287"},
{"id":"57415","form":"ț'est","formNoAccent":"țest","formUtf8General":"țest","reverse":"tseț","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.93","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"24","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245213"},
...
]
I want to add these entries inside Room
and have them persist there. The problem I am facing right now is that I haven't done anything similar to this and I am getting out of memory
when I try to transform everything into a list of objects using Moshi
.
The solution would be to load every item separately but I don't think it will be possible.
Until now, it looks like this:
val archive = context.assets.open("table_lexeme.zip")
val destination = File.createTempFile("table_lexeme", ".zip")
val jsonFile = File.createTempFile("lexeme", ".json")
archive.use {
destination.writeBytes(it.readBytes())
}
ZipFile(destination).use { zip ->
zip.entries().asSequence().forEach { zipEntry ->
if (zipEntry.name == "dex_table_lexeme.json") {
zip.getInputStream(zipEntry).use { inputStream ->
val bos = BufferedOutputStream(FileOutputStream(jsonFile))
val bytesIn = ByteArray(BUFFER_SIZE)
var read: Int
while (inputStream.read(bytesIn).also { read = it } != -1) {
bos.write(bytesIn, 0, read)
}
bos.close()
}
}
}
}
val jsonReader = JsonReader(InputStreamReader(jsonFile.inputStream(), Charsets.UTF_8))
jsonReader.beginArray()
Upvotes: 0
Views: 542
Reputation: 1006944
The literal solution would be to switch to a streaming JSON parser, so your whole data set is not loaded into RAM at once. JsonReader
in the Android SDK works this way, and Gson has a streaming mode. I do not recall Moshi offering this, but I haven't looked for that recently.
The realistic solution is to not package JSON. Importing those will be slow, even if you use transaction batches (e.g., insert 100 rows in a batch). You are packaging your data as an asset, so you will be better off (IMHO) generating the SQLite database on your development machine and packaging it. Room has built-in support to copy a packaged database from an asset and put it into position for use. While your database file is going to be large, it will be faster to copy it than to create it on the fly with imported data.
Upvotes: 1