johngreen283
johngreen283

Reputation: 57

Best way of bulk inserting a list of records in DuckDB using DuckDB.NET

Context

I'm working with the awesome DuckDB.NET library and C# for this example. I'm specifically working with the ADO.NET provider.

Problem

My database contains the following table:

CREATE TABLE tbl01 (
    Id INTEGER,
    TextListTest TEXT[],
    DateTest DATE
);

In the program each record is encapsulated by a class:

class Record
{
    public int Id { get; set; }
    public List<string> TextListTest { get; set; };
    public DateTime TextListTest { get; set; };
}

and is appended to a List<Record>. This list gets very large so I would like to avoid the per-row overhead of INSERT statements in a loop. The documentation says that if I absolutely must use inserts in this manner I should also wrap them in calls of BEGIN TRANSACTION and COMMIT. I really don't want to miss out on insert performance here. Is there another approach I can use with the library I'm using?

I noticed in the DuckDB.NET sample with the LowLevelBindingsSample() method that I could use prepared statements but I'm not sure that if that would confer any performance benefits.

Is there an approach I'm missing - perhaps the appender? If someone could provide an example using the 3 specific data types in the above table that would be greatly appreciated (I'm having trouble figuring out the LIST column).

using DuckDB.NET.Data;

namespace DuckTest;

class Record
{
    public int Id { get; set; }
    public List<string> TextListTest { get; set; }
    public DateTime DateTest { get; set; }
}

class Program
{
    public static void Main(string[] args)
    {
        // pretend this is a really big list!
        List<Record> recordList = new List<Record>
        {
            new Record { Id = 1, TextListTest = new List<string> { "Ball", "Horse" }, DateTest = new DateTime(1994, 12, 3) },
            new Record { Id = 2, TextListTest = new List<string> { "Matthew", "Giorgi is cool!" }, DateTest = new DateTime(1998, 11, 28) },
            new Record { Id = 3, TextListTest = new List<string> { "Red", "Black", "Purple" }, DateTest = new DateTime(1999, 9, 13) },
            new Record { Id = 4, TextListTest = new List<string> { "Cat" }, DateTest = new DateTime(1990, 2, 5) },
        };

        using (var duckDBConnection = new DuckDBConnection("Data Source=db01.duckdb"))
        {
            duckDBConnection.Open();

            var command = duckDBConnection.CreateCommand();

            command.CommandText = "CREATE TABLE tbl01 ( Id INTEGER, TextListTest TEXT[], DateTest DATE );";
            var executeNonQuery = command.ExecuteNonQuery();

            // I could do this in a loop but there's probably a better way...
            command.CommandText = "INSERT INTO tbl01 VALUES (1, ['Ball', 'Horse'], '1994-12-03');";
            executeNonQuery = command.ExecuteNonQuery();
        }
    }
}

I would be willing to use the low level bindings library if needed.

Upvotes: 1

Views: 2950

Answers (1)

Giorgi
Giorgi

Reputation: 30883

The library has support for appenders so that should be the fastest way to import data.

Check out the Bulk Data Loading for more details and options.

Upvotes: 2

Related Questions