Reputation: 2881
I need to output the write timestamp as part of a table export for lots of tables, though I quite cannot figure out a way which does not force me to explicitely select all columns in the statement.
Instead of being able to do just this:
SELECT *, writetime(data) AS timestamp FROM dls.licenses;
I have to do that:
SELECT column1, column2, ... , writetime(data) AS timestamp FROM dls.licenses;
This is pretty unconvenient since it means I'd have to change the export tool every time the schema of any of the tables changes.
Is there a better way?
Edit: To clarify, the actual error I get is the following. The way the syntax is presented in the error one could think that the SQL should be ok:
SELECT *, writetime(id) AS timestamp FROM dls.licenses;
SyntaxException: line 1:8 mismatched input ',' expecting K_FROM (SELECT *[,]...)
Edit 2: Here is the keyspace and create statement used for this table:
CREATE KEYSPACE IF NOT EXISTS dls WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': ‚1‘ };
CREATE TABLE IF NOT EXISTS dls.licenses (subscription_id text, id text, key text, data text, PRIMARY KEY (key));
CREATE INDEX IF NOT EXISTS ON dls.licenses (id);
BTW: I'm using the fresh Cassandra 4.0.0 (GA).
Upvotes: 1
Views: 513
Reputation: 16293
The star (*
) in SELECT *
is the CQL syntax for "ALL columns" so by definition, it is not possible to include another column since ALL of them are selected even for native CQL functions. For this reason, you need to enumerate all column names + functions-on-columns.
+1 to Yuki's answer. I wanted to add that DSBulk adds a WRITETIME()
column for every column in the table because it isn't possible to know in advance the write-time of each column in the partition until the full partition has been read.
Allow me to explain it using a couple of examples.
Consider this table:
CREATE TABLE users_by_email (
email text,
name text,
address text,
mobile text,
PRIMARY KEY (email)
)
If we add a new record with a value specified for all columns:
INSERT INTO users_by_email (email, name, address, mobile)
VALUES ('[email protected]', 'Alice', '221B Baker St', '098-765-432-109');
then for this partition, all columns will have the same write-time.
Consider a situation where a record is fragmented across multiple inserts over a period of time such as:
INSERT INTO users_by_email (email, name) VALUES ('[email protected]', 'Bob');
INSERT INTO users_by_email (email, address) VALUES ('[email protected]', '350 Fifth Ave');
INSERT INTO users_by_email (email, mobile) VALUES ('[email protected]', '012-555-123-456');
Each of the columns name
, address
and mobile
would all have different write-times.
From these 2 examples, you should see that there isn't always a single write-time that applies to all columns in the partition.
For your specific use case, you need to figure out from the DSBulk output which write-time to use for situations where the partition fragments are inserted/updated at different times. Cheers!
Upvotes: 0
Reputation: 641
If you are exporting to CSV or JSON files, you may consider using DataStax's dsbulk.
https://github.com/datastax/dsbulk
The latest version of dsbulk 1.8.0 added support to export writetime and ttl.
dsbulk unload -url myData.csv -k ks1 -t table1 --timestamp
Upvotes: 1
Reputation: 1
The WHERE clause specifies which rows must be queried. It is composed of relations on the columns that are part of the PRIMARY KEY and/or have a secondary index defined on them.
In Cassandra 3.6 and later, add ALLOW FILTERING to filter only on a non-indexed cluster column.
You may be able to solve your query problem by creating a secondary index on the column you want the writetime for. Keep in mind secondary indexes create overhead and which may result in unintended consequences.
Upvotes: 0