Reputation: 105

How to model many to many relationship in DynamoDB with Java SDK

I have reviewed some article about DynamoDB manyToMany relationship.

As I understand it; There should be single table and I need to use composite primary key. 

For book item ; hashKey(partionKey) should be book id; and RangeKey(sortKey) should be author id; For Author item; hashKey(partionKey) should be author id; and RangeKey(sortKey) should be book id;

I have created these models;

 @Data
 @DynamoDBTable(tableName = "author_book_table")
 public class Book {

    @Id
    private BookId id;

    private String name;

    private Integer pages;

 }



@Data
@DynamoDBTable(tableName = "author_book_table")
public class Author {

    @Id
    private AuthorId id;
    @DynamoDBAttribute
    private String name;

}

Id models is like below;

@Data
public class BookId {

    @DynamoDBHashKey
    private String bookId;

    @DynamoDBRangeKey
    private String authorId;
}


@Data
public class AuthorId {
    @DynamoDBHashKey
    private String authorId;
    @DynamoDBRangeKey
    private String bookId;
}

But I can’t imagine how to use it, how to query books by Author or authors by book. What is the correct way to do it?

I could not find any example for relationships on java. I also using Spring data dynamoDB moodule.

Upvotes: 0

Answers (1)

Itay Maman

Reputation: 30723

[side note: Are you sure the spring data dynamodb module allows you to read/write two different @Data classes on the same table? If the answer is "no" then you will need to use two separate tables. Anyhow, the rest of my answer is not affected by that (because you can certainly do store more then one type of items in the same DDB table when using other SDKs]

There are four use cases I can think of:

(a) given the ID of an author you want to get the IDs of all books she wrote

(b) given the ID of an author you want to get the names of all books she wrote

(d) given the ID of a book you you want to get the names of all authors of the book

The quick answer is that given this data model (a) and (c) can be easily achieved with a single query. However (b) and (d) cannot be achieved with a single query (see answer 1, below). However, the "right" approach in DDB is to model things differently (see answer 2, below)

Answer 1

As stated here when you do a query on a DDB table you can give it a partition key (aka: hash key). The query will return all items with that partition key sorted by the range key. As each author item has (as a range key) a book ID this means you will get all book IDs when you pass in an author ID. Similarly, if query on a given book ID you will get all author IDs.

If you also want to get the book names (from a given author ID) you will have first to get all book IDs (as explained in the previous paragraph) and then use BatchGetItem to get the individual book items. Note that BatchGetItem an upper limit of 100 items so you may need to do multiple BatchGetItem calls to. (of course, this solution will also work in the other direction: book -> author, you just need to mentally replace book with author and vice-versa)

Answer 2

In DDB, and in many other NoSql databases, you can use denormalization of the data (that is duplicate the same piece of information across multiple items) to shape the data such that it is already stored in a way that fits your retrieval use cases. In here, it boildown to having a single type of items which contains both author details and book details.

@Data
@DynamoDBTable(tableName = "author_book_table")
public class Book {

  @DynamoDBHashKey
  @DynamoDBIndexRangeKey(globalSecondaryIndexName="ByAuthor")
  private String bookId;

  @DynamoDBRangeKey
  @DynamoDBIndexHashKey(globalSecondaryIndexName="ByAuthor")
  private String authorId;

  private String bookName;
  private String authorName;
  private Integer pages;
}

Using this data model you can still query all authors of a book issuing a query on the book ID. The items returned by the query result will contain all author names. For the other direction (author ID -> book) you also need do to a query but this time against a global secondary index (ByAuthor) which you need to define. In this index, the roles are reversed: the author ID is the hash key and the book ID is the range key.

The downside is that you need to update multiple records when a piece of data changes. For instance, if you need to update the name of author ID '100' from 'Alice' to 'Beth' you need to find all items with author ID '100' and update the author name there. Similarly, if you need to update the number of tables, you will need to update multiple items (if this book has three authors, there will be three items with that book ID which needs to be updated).

Important: you can issue this updates from your application/service. However, you need to be prepared to a situation where your service (or the underlying hardware) fail mid update. This is likely to result in inconsistent data (in some items the author name is 'Beth' but in some items it is still 'Alice'). Transaction can help you but they are limited to updating 25 items. if you cannot update in a single transaction you will need to take corrective measurement: for instance, you can periodically scan the DB and fix any inconsistencies you find. On top of that, you can have the service proactively check for inconsistencies in the item it fetches during its "regular" operation. If it finds an inconsistency it can launch a fix on those specific items.

Upvotes: 1

How to model many to many relationship in DynamoDB with Java SDK

Answers (1)

Related Questions