Reputation: 271
I have a 100GB data file with sensor readings spanning a few weeks. The timestamps are not in strict chronological order and I'd like to bulk load the data into QuestDB. The order is not completely random, but there is a deviation of up to three minutes of lateness where some rows are 3 minutes late. Is there an efficient way to do bulk loading like this and ensure that the data is ordered chronologically at the same time?
Upvotes: 1
Views: 233
Reputation: 1315
The most efficient way to do this is in a 3-step phase
Import the unordered dataset, you can do this via curl:
curl -F [email protected] 'http://localhost:9000/imp'
Create a table with the schema of the imported data and apply a partitioning
strategy. The
timestamp
column may be cast as a timestamp
if auto detection of the timestamp failed:
CREATE TABLE ordered AS (
SELECT
cast(timestamp AS timestamp) timestamp,
col1,
col2
FROM 'unordered-data.csv' WHERE 1 != 1
) timestamp(timestamp) PARTITION BY DAY;
Insert the unordered records into the partitioned table and provide a lag
and batch
size:
INSERT batch 100000 lag 180000000 INTO ordered
SELECT
cast(timestamp AS timestamp) timestamp,
col1,
col2
FROM 'unordered-data.csv';
To confirm that the table is ordered, the isOrdered()
function may be used:
select isOrdered(timestamp) from ordered
isOrdered |
---|
true |
There is more info on loading data in this way on the CSV import documentation
lag
can be about 3 minutes in your case, it's the expected lateness of recordsbatch
is the number of records to batch process at one timeUpvotes: 2