Bruno Bronosky
Bruno Bronosky

Reputation: 70329

How can I query Cassandra without knowing what I'm going to find?

Just typing that title makes me wonder if I'm in a dimension where everything I've every know about databases is wrong.

I have [several tables but this is an example]:

CREATE TABLE stream (
  source_id uuid,
  yyyymmdd int,
  event_type text,
  time timestamp,
  data text,
  PRIMARY KEY ((source_id, yyyymmdd, event_type), time)
)

I have an idea of what might be in yyyymmdd but for the other partition keys, I do not. Without knowing what the possible values for source_id and event_type are, I can't query it. What I ultimately want to know is:

What is the oldest yyyymmdd and the newest yyyymmdd in the db?

It's almost like I need a database of what is in my database.

Upvotes: 0

Views: 63

Answers (1)

Horia
Horia

Reputation: 2982

In cqlsh go to your keyspace (use <keyspace_name>) and run

copy stream(yyyymmdd) to 'stream-yyyymmdd.csv' with NUMPROCESSES = 1 and MAXREQUESTS = 1;

Or prefix the table name in the copy command with keyspace_name (< keyspace_name>.stream) if you don't want to run use <keyspace_name>.

For NUMPROCESSES and MAXREQUESTS you can use the values that suits you. Please refer to COPY documentatation here. NUMPROCESSES is number of worker processes. Maximum value is 16. Default value: -1. MAXREQUESTS is maximum number of requests each worker can process in parallel. Default value: 6.

Afterwards do a sort and extract first and last line

sort -n -o stream-yyyymmdd-sorted.csv stream-yyyymmdd.csv
head -1 stream-yyyymmdd-sorted.csv
tail -1 stream-yyyymmdd-sorted.csv

HTH

Upvotes: 1

Related Questions