Is there a good way to import unsorted data into QuestDB in bulk?

Question

I have a 100GB data file with sensor readings spanning a few weeks. The timestamps are not in strict chronological order and I'd like to bulk load the data into QuestDB. The order is not completely random, but there is a deviation of up to three minutes of lateness where some rows are 3 minutes late. Is there an efficient way to do bulk loading like this and ensure that the data is ordered chronologically at the same time?

Brian Smith · Accepted Answer

The most efficient way to do this is in a 3-step phase

Import the unordered dataset, you can do this via curl:

curl -F data=@unordered-data.csv 'http://localhost:9000/imp'

Create a table with the schema of the imported data and apply a partitioning strategy. The timestamp column may be cast as a timestamp if auto detection of the timestamp failed:
```
CREATE TABLE ordered AS (
SELECT
  cast(timestamp AS timestamp) timestamp,
  col1,
  col2
FROM 'unordered-data.csv' WHERE 1 != 1
) timestamp(timestamp) PARTITION BY DAY;
```

Insert the unordered records into the partitioned table and provide a lag and batch size:

INSERT batch 100000 lag 180000000 INTO ordered
SELECT
  cast(timestamp AS timestamp) timestamp,
  col1,
  col2
FROM 'unordered-data.csv';

To confirm that the table is ordered, the isOrdered() function may be used:

select isOrdered(timestamp) from ordered

isOrdered
true

There is more info on loading data in this way on the CSV import documentation

lag can be about 3 minutes in your case, it's the expected lateness of records
batch is the number of records to batch process at one time

Is there a good way to import unsorted data into QuestDB in bulk?

Answers (1)

Related Questions