Reputation: 1
I have 2 questions about AWS Keyspaces.
I have an important table 'posts' with this structure:
CREATE TABLE IF NOT EXISTS social_platform.posts (
id UUID,
user_id UUID,
title TEXT,
content TEXT,
created_at TIMESTAMP,
updated_at TIMESTAMP,
content_media_url TEXT,
user_username TEXT,
user_first_name TEXT,
user_last_name TEXT,
user_profile_picture TEXT,
user_role TEXT,
PRIMARY KEY (id)
);
Because AWS Keyspaces does not currently support materialized views or index I came up with a solution like this:
CREATE TABLE IF NOT EXISTS social_platform.posts_user_id_lookup (
id UUID,
user_id UUID,
PRIMARY KEY ((user_id), id)
) WITH CLUSTERING ORDER BY (id ASC);
Now I am also able to search for user specific posts.
Does this makes sense performance wise or is there a better solution to allow querying for multiple keys?
Obviously the post should be ordered by created_at. Currently I have another lookup table:
CREATE TABLE IF NOT EXISTS social_platform.posts_created_at_lookup (
date_partition DATE,
id UUID,
created_at TIMESTAMP,
PRIMARY KEY ((date_partition), created_at, id)
) WITH CLUSTERING ORDER BY (created_at DESC, id ASC);
But with this structure I need to fetch it like this:
const pageSize = parseInt(req.query.limit as string) || 10;
const lastTimestamp = req.query.lastTimestamp ? new Date(req.query.lastTimestamp as string) : new Date();
const datePartition = lastTimestamp.toISOString().split('T')[0]; // YYYY-MM-DD
console.log('Date partition:', lastTimestamp);
let query = 'SELECT id FROM posts_created_at_lookup';
let params = [];
if (lastTimestamp) {
query += ' WHERE date_partition = ? AND created_at <= ? ORDER BY created_at DESC, id ASC LIMIT ?';
params = [datePartition, lastTimestamp, pageSize];
} else {
query += ' WHERE date_partition = ? ORDER BY created_at DESC, id ASC LIMIT ?';
params = [datePartition, pageSize];
}
const timeResult = await client.execute(query, params, { prepare: true });
const orderedIds = timeResult.rows.map(row => row.id);
if (orderedIds.length === 0) {
return res.status(200).json([]);
}
const postsResult = await client.execute(
'SELECT * FROM posts WHERE id IN ?',
[orderedIds],
{ prepare: true }
);
const orderedPosts = orderedIds.map(id =>
postsResult.rows.find(post => post.id.equals(id))
);
const lastPost = orderedPosts[orderedPosts.length - 1];
const nextTimestamp = lastPost?.created_at || null;
console.log('Next timestamp:', nextTimestamp);
which to me does not seem very smart and fast. But because this is a common practice, there has to be well designed logic already, right?
Upvotes: -1
Views: 48
Reputation: 16353
Responding from an Apache Cassandra point of view, updates to materialised views are asynchronous so data written in the base tables may not be immediately available from the views.
Also there are some caveats to using MVs including the risk that a view can get out of sync with the base table so this feature has always been classified as experimental. I've explainyoued these in a bit more detail in Should we reconsider using materialized views in Cassandra?.
In relation to Cassandra's native secondary indexing, it might not be ideal if you are concerned about performance particularly when you have a large dataset (lots of posts, lots of users).
You have the right idea in designing another table for retrieving posts by a user. Your data model is close but the design can be a lot simpler.
Take this table of posts which is partitioned by user ID, sorted in reverse chronological order:
CREATE TABLE posts_by_userid (
userid uuid,
created timestamp,
postid uuid,
title text,
content text,
...
PRIMARY KEY (userid, created)
) WITH CLUSTERING ORDER BY (created DESC)
To retrieve the posts by a particular user, run:
SELECT postid, title, content FROM posts_by_userid
WHERE userid = ?
This query will return the most recent posts first and you then page through the results to retrieve the older entries.
Of course your application has to update multiple tables whenever users create new posts but that is a simple matter of batching the INSERT
s. See my other post on how to keep data in denormalised tables synchronised for details. Cheers!
DISCLOSURE: I'm an Apache Cassandra committer and I work at DataStax. I don't have anything to do with AWS or Amazon Keyspaces. 🙂
Upvotes: 0