MuGh
MuGh

Reputation: 155

Mongo collection to Duckdb in python

I want to create a table in the duckdb database from mongo collection in python, for further analytics. Now I do the following:

with open(f"mongo_json.jsonl", "w") as file:
       json.dump(list(mongo_cursor), file, default=str)

duckdb.sql(f"CREATE OR REPLACE TABLE mongo_table AS SELECT *,  FROM read_json_auto('mongo_json.jsonl', IGNORE_ERRORS=true)")

But the thing is the json is really big, which increase the memory consumption. So Are there any ideas or better approach to achieve this ?

Upvotes: 2

Views: 915

Answers (1)

Moritz Wilksch
Moritz Wilksch

Reputation: 397

If your data could fit into memory, check out pymongoarrow (link). You can use it to grab arrow tables from mongo which can easily be ingested into duckdb. You might even be able to do this in chunks to prevent going oom.

Upvotes: 1

Related Questions