Psyfun
Psyfun

Reputation: 369

Improve performance of inserting data for one to many relationship in EF

I am getting really poor performance in EF because of a particular design structure for my database. Here are the relevant relationships:


ERD


I have the following data model:


public class Sensor
{
    [Key]
    public int Id { get; set; }

    [Required, MaxLength(64)]
    public string Name { get; set; }

    [Required, ForeignKey("Type")]
    public int SensorTypeId { get; set; }

    public virtual SensorType Type { get; set; }

    public virtual ICollection<SensorSample> SensorSamples { get; set; }
}

public class SensorSample
{
    [Key]
    public int Id { get; set; }

    [Required, ForeignKey("Sensor")]
    public int SensorId { get; set; }

    public virtual Sensor Sensor { get; set; }

    [Required]
    public DateTime SampleTime { get; set; }

    [Required]
    public virtual ICollection<SampleData> SampleData { get; set; }
}

public class SampleData
{
    [Key]
    public int Id { get; set; }

    [Required, ForeignKey("DataType")]
    public int SampleDataTypeId { get; set; }

    public virtual SampleDataType DataType { get; set; }

    [Required, ForeignKey("Unit")]
    public int SampleUnitId { get; set; }

    public virtual SampleUnit Unit { get; set; }

    [Required, ForeignKey("Sample")]
    public int SensorSampleId { get; set; }

    public virtual SensorSample Sample { get; set; }

    [MaxLength(128)]
    public string Value { get; set; }
}

Because a SensorSample can have multiple data sample types (i.e. temperature, pressure, etc), an INSERT must query for existing samples to make the appropriate association with the correct SampleTime. This is done using the following code:


SensorSample sample = null;
foreach (var d in input)
{
    SampleData data = new SampleData();
    data.SampleDataTypeId = dataTypeId;
    data.SampleUnitId = unitId;
    data.Value = d.Value;

    // check for existing sample for this sensor and timestamp
    sample = SensorSamples.FirstOrDefault(s => s.SensorId == sensor.Id && s.SampleTime == d.Timestamp);
    if (sample == null)
    {
        // sample doesn't exist, create a new one
        sample = new SensorSample();
        sample.SampleTime = d.Timestamp;
        sample.SensorId = sensor.Id;
        sensor.SensorSamples.Add(sample);
    }
    // add the data to the sample
    sample.SampleData.Add(data);
}

I have tried optimizing the inserting of sample data by doing it in batches (i.e. 1000 records at a time). This does help, but even though there is an index on the SampleTime field, the lookup query seems to take longer as more records are added.

So, my question is, how do I improve the design and/or performance of adding sample data to the database? Is there a better database structure for handling the one-to-many relationship? I am willing to make some compromises on database design if I can get an appropriate offset in performance, but I still need to be able to handle different data associated with a given SampleTime.

Upvotes: 1

Views: 1535

Answers (3)

Colin
Colin

Reputation: 22595

EF6 beta 1 has an AddRange function that may suit your purpose:

INSERTing many rows with Entity Framework 6 beta 1

Note that the article I link to refers to the technique of setting AutoDetectChangesEnabled to false in EF5 that @felipe refers to

Upvotes: 0

phil soady
phil soady

Reputation: 11328

to maximize LOAD performance for test data

    DONT run project in Debug mode (multiple factor slower for EF)

use these settings:

    Context.Configuration.LazyLoadingEnabled = false;
    Context.Configuration.ProxyCreationEnabled = false;
    Context.Configuration.AutoDetectChangesEnabled = false;
    Context.Configuration.ValidateOnSaveEnabled = false;

every 100 entries or fewer, discard Context.

 Using( new context)

try

Context.Set<TPoco>().AddOrUpdate(poco);

Instead of

   Context.Set<TPoco>().firstorDefault(lamba);
   Context.Set<TPoco>().Add(poco);

Upvotes: 1

felipe
felipe

Reputation: 682

Entity Framework maintains a local cache of all the local entities, and tracks any changes are made in those entities. As the number of entities grows, the checking gets more expensive.

Here is a very interesting post series on how does DetectChanges work and what can you do about it. Look especially in part 3.

When I need to bulk load a lot of data I disable DetectChanges and also clear the local cache after saving, so that memory can be freed:

    public static void ClearDbSet<T>(this DbContext context) where T : class {
        var entries = context.ChangeTracker.Entries<T>().Where(e => e.State == EntityState.Unchanged);

        foreach (DbEntityEntry<T> entry in entries.ToList()) {
            entry.State = EntityState.Detached;
        }
    }

The ToList call is necessary otherwise the iterator will throw an exception.

Upvotes: 1

Related Questions