Hans Desjarlais
Hans Desjarlais

Reputation: 135

Need recommendation to create an API by aggregating data from multiple source APIs

Before I start doing this I wanted to get advice from the community on the best and most efficient manner to go about doing it.

Here is what I want to do:

  1. Ingest data from multiple API's which returns JSON
  2. Store it in either S3 or DynamoDB
  3. Modify the data to use my JSON structure
  4. Pipe out the aggregate data as an API

The data will be updated twice a day, so I would pull in the data from the source APIs and put it through my pipeline twice a day.

So basically I want to create an API by aggregating data from multiple source APIs.

I've started playing with Lambda and created the following function using Python.

#https://stackoverflow.com/a/41765656
import requests
import json

def lambda_handler(event, context):
    #https://www.nylas.com/blog/use-python-requests-module-rest-apis/ USEFUL!!!
    #https://stackoverflow.com/a/65896274
    response = requests.get("https://remoteok.com/api")
    #print(response.json())
    return {
        'statusCode': 200,
        'body': response.json()
    }
    #https://stackoverflow.com/questions/63733410/using-lambda-to-add-json-to-dynamodb DYNAMODB

This works and returns a JSON response.

Here are my questions:

  1. Should I store the data on S3 or DynamoDB?
  2. Which AWS service should I use to aggregate the data into my JSON structure?
  3. Which service should I use to publish the aggregate data as an API, API Gateway?

However, before I go further I would like to know what is the best way to go about doing this.

If you have experience with this I would love to hear from you.

Upvotes: 1

Views: 589

Answers (1)

Trick
Trick

Reputation: 31

The answer will vary depending on the quantity of data you're planning to mine. Lambdas are designed for short-duration, high-frequency workloads and thus might not be suitable.

I would recommend looking into AWS Glue, as this seems like a fairly typical ETL (Extract Transform Load) problem. You can set up glue jobs to run on a schedule, and as for data aggregation, that's the T in ETL.

It's simple to output the glue dataframe (result of a transformation) as s3 files, which can then be queried directly by Amazon Athena (as if they were db content).

As for exposing that data via an API, the serverless framework or SST are great tools for taking the sting out of spinning up a serverless API and associated resources.

Upvotes: 2

Related Questions