Christian
Christian

Reputation: 904

Kotlin and realm: How to only insert nested RealmObjects when they do not exist?

I work with kotlin and the following dependencies:

id("io.realm.kotlin") version "1.7.0"

implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.6.4")
implementation("io.realm.kotlin:library-base:1.7.0")

General usecase:

I download multiple csv files, convert them to RealmObjects and then I try to save the list of RealmObjects. Of course, in this case it is possible that RealmObjects with the same PrimaryKey are saved multiple times: e.g. QuantityRealmObject is used in several RealmObjects within the parent (or root) object. I thought the UpdatePolicy.Modified which does not exist for Kotlin (?) would do exactly that.

As Jay described, Upserting is probably the way to go in this case, however I am not sure if

I try to save data RealmDB which works but I do have nested RealmObjects with @PrimaryKeys. Currently my code only works with setting the UpdatePolicy to ALL which probably leads to a lot of unnecessary updates (and possibly a bigger filesize?) but less actual data in the db than when working with EmbeddedRealmObjects.

EDIT:

My problem is that my Example objects have references to other RealmObjects with PrimaryKeys (e.g. QuantityRealmObject) or references to other RealmObjects which have also references to QuantityRealmObjects. With UpdatePolicy.ALL if have the luxury that I can just call copyToRealm(exampleObject) and all references are saved correctly. If there is a duplicate primary key for a "nested" quantity object reference, it just updates it with the same values, but the references are still ok. If I want to upsert like you suggest, which of course works, I would have to check lots of "nested" realm object references for each copyToRealm(exampleObject) call:

val exampleObject

//query if exampleObject.field1.quantityRealmObject already exists, if not create it, if yes set this field to the already managed instance
//query if exampleObject.quantityRealmObject already exists, if not create it, if yes set this field to the already managed instance
//query if exampleObject.field2.field3.quantityRealmObject already exists, if not create it, if yes set this field to the already managed instance

// ... do that for lots of references

realm.copyToRealm(exampleObject)

//vs. 

realm.copyToRealm(exampleObject, UpdatePolicy.ALL)

I like the idea with error handling, however I am not sure how I could set the correct references in the ExampleObject in an error case.

fun createRealm(dbName: String, data: List<DataRealmObject>, schema: String) {
        val config = RealmConfiguration.Builder(setOf(
            // a few RealmObject classes
        )
            .compactOnLaunch()
            .build()

        val realm = Realm.open(config)

        realm.writeBlocking {
            
            data.forEach {
                this.copyToRealm(it, UpdatePolicy.ALL)
            }    
   
        }


        realm.close()
    }

When I do not set the UpdatePolicy to ALL, of course I get exceptions stating that an object with the PrimaryKey already exists. Is there a good solution to deal with this without setting the UpdatePolicy to ALL? Ideal would be something like: if an object with the given PrimaryKey does not exist, insert it, else use the already existing object.

I do suspect that the massive updates on already existing objects has a negative effect on the filesize of the realmDb.

How could I solve this problem? I could query before each copy call if each nested RealmObject already exists, however this would be very complex since there are some basic types which occur in a lot of different fields.

EDIT:

An example object could look like this:

Example(): RealmObject{ 
    var field1: String = ""
    var anotherRealmObjectRef: Quantity? = null 
    var anotherRealmObjectRef2: Another? = null
    // other fields who can contain references to objects with PrimaryKeys
    
}

Quantity(): RealmObject{
    @PrimaryKey
    var id = ""
    var value: Double = 0.0
    var unit: String = "" 
    
    // constructor sets id to e.g. value_unit 
}

Another(): RealmObject {
    // other fields
    var price: Quantity? = null
}

So as I said, I download csv files with data, convert each row to, in this case, Example realm objects. For each of those objects I must create also Quantity objects in multiple fields. I added an id field as PrimaryKey to Quantity because in reality I create maybe 1 mio example objects but there will be only 10k unique Quantity objects. So I only want unique Quantity instances in my realmdb to save space and keep the filesize small. I could potentially check before I create each Quantity object, if there is currently another Example object which contains already Quantity objects with this PrimaryKey like you showed in your code example. Due to the somewhat complex class structure this would result in a lot of code and I am not sure if that is really feasible or good to do.

UpdatePolicy.ALL basically solves this for me, because the resulting realm db only consists of unique quantity objects. However it does probably a lot of unnecessary updates on those objects.

The only real problem for me currently is that the resulting realm db has an unexpected filesize (currently around 400-500mb). A comparable realm db created with the swift sdk has around 200mb. If this is due to the mass updates (resulting in a lot of object versions?) it would be worth for me to solve the issue.

Upvotes: 1

Views: 1016

Answers (1)

Jay
Jay

Reputation: 35657

There are several questions within the question so let me try to tackle them all. I prefer including code in answers but perhaps some clarity about how Realm works would be more beneficial. Some of this answer is IMO so evaluate accordingly.

TL;DR - skip to the Edit

The code in the question doesn't work as is because it's trying to brute force add an object, which had a duplicate primary key to an existing object; primary keys must be unique so having two objects with the same primary key would not be allowed.

The difference between .all and .modified are related to how the data is written (keep reading: Upsert, which may be an answer).

.all forces all properties of an object to be re-written, whether they have changed or not. That's a whole lot of data to push around and I would find use cases for this kinda rare.

.modified only writes out fields that have been modified so in general, it's far less data and the preferred option. It will also allow for Upsert which is what you're attempting to do.

.error; if you want to prevent updating an existing object, error will throw an error if an object that the same primary key already exists

Upsert'ing is the process where if an object exists, it will be updated. If it does not exist, it will be inserted. To cause this behavior, when an object is being manipulated, set the update flag to .modified and it will magically be inserted if needed, otherwise just the modified fields will be updated on the existing object. Note that you can also partially update an object by passing the primary key and a subset of the values to update.


The question mentions "nested objects" and that's a bit ambiguous (IMO) when is comes to Realm. Unfortunately the documentation kinda of mixes 'nested' in so that can lead to confusion.

Nested: In a tree sits a birds nest with eggs. The eggs are nested; they are part of the nest and exist within and as part of the nest; they do not exist in other nests, only that nest.

Objects that are managed (and have a primary key) and are added to another object are not really "nested" - they do not become part of the parent object as they are stored by reference. The two objects are both managed and independent of each other, can exist without the other and in the case of a referenced object, can be referenced from multiple other objects (so, not really nested)

Embedded objects on the other hand are more akin to a 'nested' object; they are not managed separately, do not/can not contain a primary key and are part of the parent object's graph.

To update an embedded (nested) object would be done through dot notation starting with the parent parentObject.embeddedChildToUpdate.fieldToUpdate and would not be done using .modified or .all (in this context) since the field is being written directly. (and embedded objects would not ever be upserted since they cannot existing without the parent)

It doesn't appear you're using embedded objects - everything seems to be by reference so a bit OT but I hope that helps.

Edit

If the goal is to persist objects that have a unique primary key and to ignore those that are duplicates, this should do it. Attempt to read an object with a given primary key, if it does not exist, persist a new object; if it does exist, ignore it and move on to the next one.

for (widget in widgetList.find()) {
   realm.write {
      val widget = //fetch the widget via it's primary key

      //if there is no widget with that primary key, persist it
      if (widget == null) {
         widget.copyToRealm(WidgetClass().apply {
            _id = ObjectId()
            //populate the properties if needed
         }
      })

      //if we get here, a widget with that primary key exists
      //  so don't persist it (e.g. ignore it)
   }
}

This process does not need .all or .modified or even an upsert

The other option is to attempt to write each object - if an object with an existing primary key exists, and error will be thrown. Handle the error elegantly (pretty much do nothing) and then move on to the next object.

Upvotes: 1

Related Questions