A Simple Programmer
A Simple Programmer

Reputation: 644

RDS-data query to pandas dataframe

I have a serverless Aurora DB on AWS RDS (with data api enabled) that I would like to query using the database resource and secret ARNs. The code to do this is shown below.

rds_data = boto3.client('rds-data', region_name='us-west-1')

response = rds_data.execute_statement(
            resourceArn='DATABASE_ARN',
            secretArn='DATABASE_SECRET',
            database='db',
            sql=query)

rds_data.close()

The response contains the returned records, but the column names are not shown and the format is interesting (see below).

'records': [[{'stringValue': 'name'}, {'stringValue': '5'}, {'stringValue': '3'}, {'stringValue': '1'}, {'stringValue': '2'}, {'stringValue': '4'}]]

I want to query the aurora database and then create a pandas dataframe. Is there a better way to do this? I looked at just using psycopg2 and creating a database connection, but I believe you need a user name and password. I would like to not go that route and instead authenticate using IAM.

Upvotes: 1

Views: 1025

Answers (1)

A Simple Programmer
A Simple Programmer

Reputation: 644

The answer was actually pretty simple. There is a parameter called "formatRecordsAs" in the "execute_statement" function. If you set this to json, you can get back better records.

response = rds_data.execute_statement(
            resourceArn='DATABASE_ARN',
            secretArn='DATABASE_SECRET',
            database='db',
            formatRecordsAs='JSON',
            sql=query)

This still gives you back a string, so then you just need to convert that string representation to a list of dictionaries.

list_of_dicts = ast.literal_eval(response['formattedRecords'])

That can then be changed to a pandas dataframe

df = pd.DataFrame(list_of_dicts)

Upvotes: 1

Related Questions