Or Gal
Or Gal

Reputation: 1366

storing and deleting in cassandra wide row

i am using cassandra for a blogging app. one of my column families is for storing all the followers of of a user - UserFollowers. where each row is a user and the columns are sorted keys for the followers composed of firstname+lastname+uuid. the composite key is so i can search ranges on the followers and serve them paginated.

example - followers of user A would look like:

A | john:2f432t3 | sam:f242fg | joe:f24gf24

all well and good so far. when i add a follower he falls into his sorted place and i can search and retrieve however i like. but now sam decided to stop being a follower and i need to delete him. moreover - just before that sam changed his name to samuel so the delete message i send now is samuel:f242fg. that value will not be found and the column sam:f242fg will stay.

my only solution for it now is that when i want to delete i have to pull out the entire row. locate sam by his id only. get the key that was stored initially and remove it. very inefficient for people with many followers and depends on these kinds of removals not happening a lot.

any better strategies out there?

thanks or

Upvotes: 0

Views: 641

Answers (2)

Or Gal
Or Gal

Reputation: 1366

ok i think ive found a way to do it more efficiently. it requires a bit more work application side but it works and allows deletions regardless of changes made to source.

just to define the problem again:

  1. we have 2 entities that reference each other. example - User and Other Users. Users follow Other Users and Other Users are followed by Users.
  2. we want to store the related entities horizontally. so we have a CF UserFollowers that stores in each row all the followers of the user.
  3. we also have in inverse CF UserFollowing to store all the users this user is following.
  4. what we actually store is a column for each followed or following user where the name is a key composed of firstname:lastname:uuid and the value is a compact json of the user.
  5. now getting followers or following users is easy enough with range queries on the name.
  6. removing a user from either one of the lists is however more tricky because we need to send a delete message with the original key that was stored.

example: if sam:jones:safg8sdfg followed abe:maxwell:fh2497h9 we would have -

in UserFollowers: fh2497h9 | sam:jones:safg8sdfg<json for sam>
and in UserFollowing: safg8sdfg | abe:maxwell:fh2497h9<json for abe>

if sam changes his name to sammy and tries to unfollow abe it wont work because the delete message will now attempt to delete a column in UserFollowers with name sammy:jones:safg8sdfg when the actual column stored is sam:jones:safg8sdfg.

so my solution to this was to store a reverseKey with the stored json on each side so that each side knows what key was actually stored on the other side and can use that to remove itself from there.

it would look like:

in UserFollowers: fh2497h9 | sam:jones:safg8sdfg<json for sam.. reversKey:abe:maxwell:fh2497h9>
and in UserFollowing: safg8sdfg | abe:maxwell:fh2497h9<json for abe..reverseKey:sam:jones:safg8sdfg>

now when sam wants removes abe from his Following he can use the reverseKey:sam:jones:safg8sdfg to remove himself from abes follower list.

and everyone is happy.

Upvotes: 0

rs_atl
rs_atl

Reputation: 8985

I suggest the following:

  1. Change your key on UserFollowers to an ID that represents the user.
  2. Add a "name" column that contains the name of that user.
  3. Instead of storing followers' names, store their IDs.

So your data now looks like this:

f1341df | name: george | 2f432t3 | f242fg | f24gf24
2f432t3 | name: john | f242fg | f1341df

... etc

Now you can get a list of followers' names by first querying the user and getting a list of IDs, then doing a multi-get with all those keys in a single query. If a user changes their name, this doesn't break your model.

Upvotes: 1

Related Questions