Reputation: 167
I have a point of sale system that exports my data to a .csv which I then import into KDB. Currently what I do is export all the data from the POS to the csv and then create a single table. I have about 10 months of sales data and my csv file is about 11mb. As time grows I imagine that the csv file will be really huge and I wonder if this is inefficient.
At my old job what we would do is have a table for each individual day of data and then there would be an _hist table which would combine all the daily files. So if I wanted to just look at the data for the day I would look at the invoicedata table and if I wanted to look across all time I would look at the invoidata_hist table and set the query to look at date within (dateA;dateB). I'm wondering if I should set things up this way rather than the way I'm doing it now.
Am I better off having one very large csv file with all the data in it or should I create a csv file for each individual day? If the second way is better can anybody give me an idea of the best way to set this all up?
Thanks!
Upvotes: 2
Views: 1077
Reputation: 3229
If your total number of records are not going to exceed a few million then partitioning might be an overkill.
I'll think of partitioning the data if my daily table count is around 1 million or more.
You also need to consider how you are going to access the data, e.g. checking last-n
records of an in-frequent customer in date
partitioned table might impact your query performance as you have to iteratively lookback. In that case, a splayed or yearly partitioned table might be the right fit.
Having said that there are multiple ways you can store the data on disk, check out this link.
binary serialized (stores the table as a binary block)
`:/db/t set ([] ti:09:30:00 09:31:00; p:101.5 33.5)
splayed table (vertical/column partitioned )
`:/db/t/ set ([] ti:09:30:00 09:31:00; p:101.5 33.5) // trailing "/" in the file handle
partitioned table ( horizontally partitioned splayed table)
.Q.dpft[directory;partition;`p#field;tablename]
segmented tables ( advance partition for better query/map-reduce results)
.Q.dpft[directory;partition;`p#field;tablename]
save
function - to save data as binary/xml/csv/txt/xml format.
Since you have asked for a date partitioned table in the question, actually there are different ways you can partition your data :
long
column )You might want to store the data in a monthly partition based on the table count.
For saving the data to a partition, you can use .Q.dpft function
.Q.dpft[directory;partition;`p#field;tablename]
example from code.kx:
q)trade:([]sym:10?`a`b`c;time:.z.T+10*til 10;price:50f+10?50f;size:100*1+10?10)
q).Q.dpft[`:db;2007.07.23;`sym;`trade]
`trade
Upvotes: 2