Reputation: 155
I am trying to load a big Pandas table to dynamoDB.
I have tried the for loop method as follow
for k in range(1000):
trans = {}
trans['Director'] = DL_dt['director_name'][k]
trans['Language'] = DL_dt['original_language'][k]
print("add :", DL_dt['director_name'][k] , DL_dt['original_language'][k])
table.put_item(Item=trans)
it works but it's very time consuming.
Is there a faster way to load it ? (equivalent of to_sql
for sql database)
I've found the batchwriteitem
function but i am not sure it works and i don't know exactly how to use it.
Thanks a lot.
Upvotes: 4
Views: 10515
Reputation: 283
I did this using aws wrangler. It was a fairly simple process, the only tricky bit was handling pandas floats, so I converted them to decimals before loading the data in.
import awswrangler as wr
def float_to_decimal(num):
return Decimal(str(num))
def pandas_to_dynamodb(df):
df = df.fillna(0)
# convert any floats to decimals
for i in df.columns:
datatype = df[i].dtype
if datatype == 'float64':
df[i] = df[i].apply(float_to_decimal)
# write to dynamodb
wr.dynamodb.put_df(df=df, table_name='table-name')
pandas_to_dynamodb(df)
Upvotes: 4
Reputation: 56
You can iterate over the dataframe rows, transform each row to json and then convert it to a dict using json.loads, this will also avoid the numpy data type errors.
you can try this:
import json
from decimal import Decimal
DL_dt = DL_dt.rename(columns={
'director_name': 'Director',
'original_language': 'Language'
})
with table.batch_writer() as batch:
for index, row in DL_dt.iterrows():
batch.put_item(json.loads(row.to_json(), parse_float=Decimal))
Upvotes: 4
Reputation: 5648
Batch writer docs here.
Try this:
with table.batch_writer() as batch:
for k in range(1000):
trans = {}
trans['Director'] = DL_dt['director_name'][k]
trans['Language'] = DL_dt['original_language'][k]
print("add :", DL_dt['director_name'][k] , DL_dt['original_language'][k])
batch.put_item(trans))
Upvotes: 1