pdiffley
pdiffley

Reputation: 713

How quickly can Realm return sorted data?

Realm allows you to receive the results of a query in sorted order.

let realm = try! Realm()
let dogs = realm.objects(Dog.self)
let dogsSorted = dogs.sorted(byKeyPath: "name", ascending: false)

I ran this test to see how quickly realm returns sorted data

import Foundation
import RealmSwift

class TestModel: Object {
    @Persisted(indexed: true) var value: Int = 0
}

class RealmSortTest {
    let documentCount = 1000000
    var smallestValue: TestModel = TestModel()
    
    func writeData() {
        let realm = try! Realm()
        var documents: [TestModel] = []
        for _ in 0 ... documentCount {
            let newDoc = TestModel()
            newDoc.value = Int.random(in: 0 ... Int.max)
            documents.append(newDoc)
        }
        try! realm.write {
            realm.deleteAll()
            realm.add(documents)
        }
    }
    
    func readData() {
        let realm = try! Realm()
        let sortedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
                
        let start = Date()
        
        self.smallestValue = sortedResults[0]
        
        let end = Date()
        let delta = end.timeIntervalSinceReferenceDate - start.timeIntervalSinceReferenceDate
        print("Time Taken: \(delta)")
    }
    
    func updateSmallestValue() {
        let realm = try! Realm()
        let sortedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")

        smallestValue = sortedResults[0]
        
        print("Originally loaded smallest value: \(smallestValue.value)")
        
        let newSmallestValue = TestModel()
        newSmallestValue.value = smallestValue.value - 1
        try! realm.write {
            realm.add(newSmallestValue)
        }
        
        print("Originally loaded smallest value after write: \(smallestValue.value)")
        
        let readStart = Date()
        smallestValue = sortedResults[0]
        let readEnd = Date()
        let readDelta = readEnd.timeIntervalSinceReferenceDate - readStart.timeIntervalSinceReferenceDate
        print("Reloaded smallest value \(smallestValue.value)")
        print("Time Taken to reload the smallest value: \(readDelta)")
    }
}

With documentCount = 100000, readData() output:

Time taken to load smallest value: 0.48901796340942383

and updateData() output:

Originally loaded smallest value: 2075613243102
Originally loaded smallest value after write: 2075613243102
Reloaded smallest value 2075613243101
Time taken to reload the smallest value: 0.4624580144882202

With documentCount = 1000000, readData() output:

Time taken to load smallest value: 4.807577967643738

and updateData() output:

Originally loaded smallest value: 4004790407680
Originally loaded smallest value after write: 4004790407680
Reloaded smallest value 4004790407679
Time taken to reload the smallest value: 5.2308430671691895

The time taken to retrieve the first document from a sorted result set is scaling with the number of documents stored in realm rather than the number of documents being retrieved. This indicates to me that realm is sorting all of the documents at query time rather than when the documents are being written. Is there a way to index your data so that you can quickly retrieve a small number of sorted documents?

Edit:

Following discussion in the comments, I updated the code to load only the smallest value from the sorted collection.

Edit 2

I updated the code to observe the results as suggested in the comments.

import Foundation
import RealmSwift

class TestModel: Object {
    @Persisted(indexed: true) var value: Int = 0
}

class RealmSortTest {
    let documentCount = 1000000
    var smallestValue: TestModel = TestModel()
    var storedResults: Results<TestModel> = (try! Realm()).objects(TestModel.self).sorted(byKeyPath: "value")
    var resultsToken: NotificationToken? = nil
    
    func writeData() {
        let realm = try! Realm()
        var documents: [TestModel] = []
        for _ in 0 ... documentCount {
            let newDoc = TestModel()
            newDoc.value = Int.random(in: 0 ... Int.max)
            documents.append(newDoc)
        }
        try! realm.write {
            realm.deleteAll()
            realm.add(documents)
        }
    }
    
    func observeData() {
        let realm = try! Realm()
        print("Loading Data")
        let startTime = Date()
        self.storedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
        self.resultsToken = self.storedResults.observe { changes in
            let observationTime = Date().timeIntervalSince(startTime)
            print("Time to first observation: \(observationTime)")
            let firstTenElementsSlice = self.storedResults[0..<10]
            let elementsArray = Array(firstTenElementsSlice) //print this if you want to see the elements
            elementsArray.forEach { print($0.value) }
            let moreElapsed = Date().timeIntervalSince(startTime)
            print("Time to printed elements: \(moreElapsed)")
        }
    }
}

and I got the following output

Loading Data
Time to first observation: 5.252112984657288
3792614823099
56006949537408
Time to printed elements: 5.253015995025635

Reading the data with an observer did not reduce the time taken to read the data.

Upvotes: 0

Views: 1182

Answers (2)

pdiffley
pdiffley

Reputation: 713

At this time it appears that Realm sorts data when it is accessed rather than when it is written, and there is not a way to have Realm sort data at write time. This means that accessing sorted data scales with the number of documents in the database rather than the number of documents being accessed.

The actual time taken to access the data varies by use case and platform.

Upvotes: 1

Jay
Jay

Reputation: 35648

dogs and dogsSorted are Realm Results Collection object that essentially contains pointers to the underlying data, not the data itself.

Defining a sort order does NOT load all of the objects and they remain lazy - only loading as needed, which is one of the huge benefits to Realm; giant datasets can be used without worrying about overloading memory.

It's also one of the reasons that Realm Results objects always reflect the current state of the data of the underlying data; that data can change many times and what you see in your app Results vars (and Realm Collections in general) will always show the updated data.

As a side node, at this time working with Realm Collection objects with Swift High Level functions causes that data to load into memory - so don't do that. Sort, Filter etc with Realm functions and everything stays lazy and memory friendly.

Indexing is a trade off; on one hand it can improve the performance of certain queries like an equality ( "name == 'Spot'" ) but on the other hand it can slow down write performance. Additionally, adding indexes takes up a bit more space.

Generally speaking, indexing is best for specific use cases; maybe in a situation were you doing some kind of type ahead autofill where performance is critical. We have several apps with very large datasets (Gb's) and nothing is indexed because the performance advantage received is offset by slower writes, which are done frequently. I suggest starting without indexing.

EDIT:

Going to update the answer based on additional discussion.

First and foremost, copying data from one object to another is not a measure of database loading performance. The real objective here is the user experience and/or being able to access that data - from the time the user expects to see the data to when it's shown. So let's provide some code to demonstrate general performance:

We'll first start with a similar model to what the OP used

class TestModel: Object {
    @Persisted(indexed: true) var value: Int = 0
    
    convenience init(withIndex: Int) {
        self.init()
        self.value = withIndex
    }
}

Then define a couple of vars to hold the Results from disk and a notification token which allows us to know when that data is available to be displayed to the user. And then lastly a var to hold the time of when the loading starts

var modelResults: Results<TestModel>!
var modelsToken: NotificationToken?
var startTime = Date()

Here's the function that writes lots of data. The objectCount var will be changed from 10,000 objects on the first run to 1,000,000 objects on the second. Note this is bad coding as I am creating a million objects in memory so don't do this; for demonstration purposes only.

func writeLotsOfData() {
    let realm = try! Realm()
    let objectCount = 1000000
    autoreleasepool {
        var testModelArray = [TestModel]()
        for _ in 0..<objectCount {
            let m = TestModel(withIndex: Int.random(in: 0 ... Int.max))
            testModelArray.append(m)
        }

        try! realm.write {
            realm.add(testModelArray)
        }
        
        print("data written: \(testModelArray.count) objects")
    }
}

and then finally the function that loads those objects from realm and outputs when the data is available to be shown to the user. Note they are sorted per the original question - and in fact will maintain their sort as data is added and changed! Pretty cool stuff.

func loadBigData() {
    let realm = try! Realm()
    print("Loading Data")
    
    self.startTime = Date()
    self.modelResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
    self.modelsToken = self.modelResults?.observe { changes in
        let elapsed = Date().timeIntervalSince(self.startTime)
        print("Load completed of \(self.modelResults.count) objects -  elapsed time of \(elapsed)")
    }
}

and the results. Two runs, one with 10,000 objects and one with 1,000,000 objects

data written: 10000 objects
Loading Data
Load completed of 10000 objects -  elapsed time of 0.0059670209884643555

data written: 1000000 objects
Loading Data
Load completed of 1000000 objects -  elapsed time of 0.6800119876861572

There are three things to note

  1. A Realm Notification object fires an event when the data has completed loading, and also when there are additional changes. We are leveraging that to notify the app when the data has completed loading and is available to be used - shown to the user for example.

  2. We are lazily loading all of the objects! At no point are we going to run into a memory overloading issue. Once the objects have loaded into the results, they are then freely available to be shown to the user or processed in whatever way is needed. Super important to work with Realm objects in a Realm way when working with large datasets. Generally speaking, if it's 10 objects well, no problem tossing them into an array, but when there are 1 Million objects - let Realm do it's lazy job.

  3. The app is protected using the above code and techniques. There could be 10 objects or 1,000,000 objects and the memory impact is minimal.

EDIT 2

(see comment to the OP's question for more info about this edit)

Per a request fromt the OP, they wanted to see the same exercise with printed values and times. Here's the updated code

self.modelsToken = self.modelResults?.observe { changes in
    let elapsed = Date().timeIntervalSince(self.startTime)
    print("Load completed of \(self.modelResults.count) objects -  elapsed time of \(elapsed)")
    print("print first 10 object values")
    let firstTenElementsSlice = self.modelResults[0..<10]
    let elementsArray = Array(firstTenElementsSlice) //print this if you want to see the elements
    elementsArray.forEach { print($0.value)}
    let moreElapsed = Date().timeIntervalSince(self.startTime)
    print("Printing of 10 elements completed: \(moreElapsed)")
}

and then the output

Loading Data
Load completed of 1000000 objects -  elapsed time of 0.6730009317398071
print first 10 object values
12264243738520
17242140785413
29611477414437
31558144830373
32913160803785
45399774467128
61700529799916
63929929449365
73833938586206
81739195218861
Printing of 10 elements completed: 0.6745189428329468

Upvotes: 0

Related Questions