BurningFlan
BurningFlan

Reputation: 271

Is there a good way to import unsorted data into QuestDB in bulk?

I have a 100GB data file with sensor readings spanning a few weeks. The timestamps are not in strict chronological order and I'd like to bulk load the data into QuestDB. The order is not completely random, but there is a deviation of up to three minutes of lateness where some rows are 3 minutes late. Is there an efficient way to do bulk loading like this and ensure that the data is ordered chronologically at the same time?

Upvotes: 1

Views: 233

Answers (1)

Brian Smith
Brian Smith

Reputation: 1315

The most efficient way to do this is in a 3-step phase

  1. Import the unordered dataset, you can do this via curl:

    curl -F [email protected] 'http://localhost:9000/imp'
    
  2. Create a table with the schema of the imported data and apply a partitioning strategy. The timestamp column may be cast as a timestamp if auto detection of the timestamp failed:

    CREATE TABLE ordered AS (
    SELECT
      cast(timestamp AS timestamp) timestamp,
      col1,
      col2
    FROM 'unordered-data.csv' WHERE 1 != 1
    ) timestamp(timestamp) PARTITION BY DAY;
    
  3. Insert the unordered records into the partitioned table and provide a lag and batch size:

    INSERT batch 100000 lag 180000000 INTO ordered
    SELECT
      cast(timestamp AS timestamp) timestamp,
      col1,
      col2
    FROM 'unordered-data.csv';
    

To confirm that the table is ordered, the isOrdered() function may be used:

select isOrdered(timestamp) from ordered
isOrdered
true

There is more info on loading data in this way on the CSV import documentation

  • lag can be about 3 minutes in your case, it's the expected lateness of records
  • batch is the number of records to batch process at one time

Upvotes: 2

Related Questions