Reputation: 155
I want to create a table in the duckdb database from mongo collection in python, for further analytics. Now I do the following:
with open(f"mongo_json.jsonl", "w") as file:
json.dump(list(mongo_cursor), file, default=str)
duckdb.sql(f"CREATE OR REPLACE TABLE mongo_table AS SELECT *, FROM read_json_auto('mongo_json.jsonl', IGNORE_ERRORS=true)")
But the thing is the json is really big, which increase the memory consumption. So Are there any ideas or better approach to achieve this ?
Upvotes: 2
Views: 915
Reputation: 397
If your data could fit into memory, check out pymongoarrow
(link). You can use it to grab arrow tables from mongo which can easily be ingested into duckdb
. You might even be able to do this in chunks to prevent going oom.
Upvotes: 1