Reputation: 8127
I am looking for an AWS-centric solution (avoiding 3rd party stuff if possible) for visualizing data that is in a very simple DynamoDB table.
We use AWS Quicksight for many other reports and dashboards for our clients so that is goal to have visualizations made available there.
I was very surprised to see that DynamoDB was not a supported source for Quicksight although many other things are like S3, Athena, Redshift, RDS, etc.
Does anyone have any experience for creating a solution for this?
I am thinking that I will just create a job that will dump the DynamoDB table to S3 every so often and then use the S3 or Athena integrations with Quicksight to read/display it. It would be nice to have a simple solution for more live data.
Upvotes: 33
Views: 35942
Reputation: 8127
!!UPDATE!! As of 2021, we can finally get Athena Data connectors to expose DynamoDB data in Quicksight without any custom scripts or duplicate data.
That being said, I would like the caveat this by saying just because it can be done, you may need to ask yourself if this is really a good solution for your workload. DynamoDB isn't the best for data warehousing use cases and performing large scans on tables can end up being slow/costly. If your dataset is very large and this is a real production use case, it would probably be best to still go with an ETL workflow and move the DynamoDB data to a more appropriate data store.
But.. if you are still interested in seeing DynamoDB data live QuickSight without any additional ETL processes to move/transform the data: I wrote a detailed blog post with step by step instructions but in general, here is the process:
Bingo bango, you should now be able to directly query or cache DynamoDB data in Quicksight without needing to create custom code or jobs that duplicate your data to another data source.
As of March 2020, Amazon is making available a beta feature called Athena DynamoDB Connector.
Hopefully once this feature is GA, it can be easily imported into Quicksight and I can update the answer with the good news.
There are many new data sources that AWS is making available in beta for autmoting the connections to Athena.
You can set these up via the console by:
Now you can go to the Athena query editor, select the catalog you just created and see a list of all DyanmoDB tables for your region, under the default
Athena database in the new catalog, that you can now query as part of Athena.
Upvotes: 29
Reputation: 1124
Possible solutions are explained in other answers. Just wanted to discuss another point.
BI tools such as QuickSight are designed to be usually used on top of analytical data stores such as Redshift, S3 etc. DynamoDB is not a very suitable data storage for analytics purposes. Row by row operations such as "put" or "get" are very efficient. But bulk operations such as "scan" are expensive. If you are constantly doing scans during the day, your DynamoDB costs might grow fast.
A possible way is to cache the data in SPICE (QuickSight's in memory cache). But a better way is to unload the data into a better suited storage such as S3 or RedShift. Couple of solutions are given on other answers.
Upvotes: 1
Reputation: 11
Would love to see DynamoDB integration with Quicksight. Using DynamoDB streams to dump to S3 doesn't work because DynamoDB streams send out events instead of updating records. Hence if you read from this S3 bucket you'll have two instances of the same item: one before update and one after update.
One solution that I see now is to dump data from DynamoDB to a S3 bucket periodically using data pipeline and use Athena and Quicksight on this s3 bucket.
Second solution is to use dynamo db stream to send data to elastic search using lambda function. Elastic search has a plug in called Kibana which has pretty cool visualizations. Obviously this is going to increase your cost because now you are storing your data in two places.
Also make sure that you transform your data such that each Elastic Search document has the most granular data according to your needs. As kibana visualizations will aggregate everything in one document.
Upvotes: 1
Reputation: 1042
We want DynamoDB support in Quicksight!
The simplest way I could find is below:
1 - Create a Glue Crawler which takes DynamoDB table as a Data Source and writes documents to a Glue Table. (Let's say Table X)
2 - Create a Glue Job which takes 'Table X' as a data source and writes them into a S3 Bucket in parquet format. (Let's say s3://table-x-parquets)
3 - Create a Glue Crawler which takes 's3://table-x-parquets' as data source and creates a new Glue Table from it. (Let's say Table Y)
Now you can execute Athena queries in Table Y and also you can use it as Data Set in Quicksight.
Upvotes: 12
Reputation: 61
I'd also like to see a native integration between DynamoDB and QuickSight, so I will be watching this thread as well.
But there is at least 1 option that's closer to what you want. You could enable Streams on your DynamoDB table and then set up a trigger to trigger a Lambda function when changes are made to DynamoDB.
Then you could only take action on specific DynamoDB events if you like ('Modify', 'Insert', 'Delete') and then dump the new/modified record to S3. That would be pretty close to real-time data, as it would trigger immediately upon update.
I did something similar in the past but instead of dumping data to S3 I was updating another DynamoDB table. It would be pretty simple to switch the example to S3 instead. See below.
const AWS = require('aws-sdk');
exports.handler = async (event, context, callback) => {
console.log("Event:", event);
const dynamo = new AWS.DynamoDB();
const customerResponse = await dynamo.scan({
TableName: 'Customers',
ProjectionExpression: 'CustomerId'
}).promise().catch(err => console.log(err));
console.log(customerResponse);
let customers = customerResponse.Items.map(item => item.CustomerId.S);
console.log(customers);
for(let i = 0; i < event.Records.length; i++)
{
if(event.Records[i].eventName === 'INSERT')
{
if(event.Records[i].dynamodb.NewImage)
{
console.log(event.Records[i].dynamodb.NewImage);
for(let j = 0; j < customers.length; j++)
{
await dynamo.putItem({
Item: {
...event.Records[i].dynamodb.NewImage,
CustomerId: { S: customers[j] }
},
TableName: 'Rules'
}).promise().catch(err => console.log(err));
}
}
}
}
}
Upvotes: 5