jpetot
jpetot

Reputation: 155

Load a pandas table to dynamoDb

I am trying to load a big Pandas table to dynamoDB.

I have tried the for loop method as follow

for k in range(1000):
    trans = {}
    trans['Director'] = DL_dt['director_name'][k]
    trans['Language'] = DL_dt['original_language'][k]
    print("add :", DL_dt['director_name'][k] , DL_dt['original_language'][k])
    table.put_item(Item=trans)

it works but it's very time consuming. Is there a faster way to load it ? (equivalent of to_sql for sql database)

I've found the batchwriteitem function but i am not sure it works and i don't know exactly how to use it.

Thanks a lot.

Upvotes: 4

Views: 10515

Answers (3)

Wev
Wev

Reputation: 283

I did this using aws wrangler. It was a fairly simple process, the only tricky bit was handling pandas floats, so I converted them to decimals before loading the data in.

import awswrangler as wr

def float_to_decimal(num):
    return Decimal(str(num))

def pandas_to_dynamodb(df):
    df = df.fillna(0)
    # convert any floats to decimals
    for i in df.columns:
        datatype = df[i].dtype
        if datatype == 'float64':
            df[i] = df[i].apply(float_to_decimal)
    # write to dynamodb
    wr.dynamodb.put_df(df=df, table_name='table-name')

pandas_to_dynamodb(df)

Upvotes: 4

Leon Moya
Leon Moya

Reputation: 56

You can iterate over the dataframe rows, transform each row to json and then convert it to a dict using json.loads, this will also avoid the numpy data type errors.

you can try this:

import json
from decimal import Decimal
DL_dt = DL_dt.rename(columns={
    'director_name': 'Director',
    'original_language': 'Language'
})
with table.batch_writer() as batch:
    for index, row in DL_dt.iterrows():
        batch.put_item(json.loads(row.to_json(), parse_float=Decimal))

Upvotes: 4

Jonathan Leon
Jonathan Leon

Reputation: 5648

Batch writer docs here.

Try this:

with table.batch_writer() as batch:
    for k in range(1000):
        trans = {}
        trans['Director'] = DL_dt['director_name'][k]
        trans['Language'] = DL_dt['original_language'][k]
        print("add :", DL_dt['director_name'][k] , DL_dt['original_language'][k])
        batch.put_item(trans))

Upvotes: 1

Related Questions