Reputation: 550
I am trying to select data from our main database (postgres) and insert it into a temporary sqlite database for some comparision, analytics and reporting. Is there an easy way to do this in Python? I am trying to do something like this:
Get data from the main Postgres db:
import psycopg2
postgres_conn = psycopg2.connect(connection_string)
from_cursor = postgres_conn.cursor()
from_cursor.execute("SELECT email, firstname, lastname FROM schemaname.tablename")
Insert into SQLite table:
import sqlite3
sqlite_conn = sqlite3.connect(db_file)
to_cursor = sqlite_conn.cursor()
insert_query = "INSERT INTO sqlite_tablename (email, firstname, lastname) values %s"
to_cursor.some_insert_function(insert_query, from_cursor)
So the question is: is there a some_insert_function
that would work for this scenario (either using pyodbc or using sqlite3)?
If yes, how to use it? Would the insert_query
above work? or should it be modified?
Any other suggestions/approaches would also be appreciated in case a function like this doesn't exist in Python. Thanks in advance!
Upvotes: 3
Views: 3176
Reputation: 38335
In addition to @JohnStanesa great answer, I'd like to add that
cursor’s arraysize
attribute can affect the performance significantly.
Doc says:
The number of rows to fetch is specified by the parameter. If it is not given, the cursor’s arraysize
attribute determines the number of rows to be fetched. If the number of rows available to be fetched is fewer than the amount requested, fewer rows will be returned.
set arraysize before executing cursor:
from_cursor.arraysize=100000 # default is 100
# after that execute fetchmany-executemany loop
while True:
current_data = from_cursor.fetchmany(100000) # increase for better performance
if not current_data:
break
to_cursor.exectutemany(insert_query, current_data)
sqlite_conn.commit()
sqlite_conn.commit()
cursor.arraysize affects only fetch, do not need to set it for inserts
Upvotes: 0
Reputation: 128
You should pass the result of your select query to execute_many
.
insert_query = "INSERT INTO smallUsers values (?,?,?)"
to_cursor.executemany(insert_query, from_cursor.fetchall())
You should also use a parameterized query (? marks), as explained here: https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.execute
If you want to avoid loading the entire source database into memory, you can use the following code to process 100 rows at a time:
while True:
current_data = from_cursor.fetchmany(100)
if not current_data:
break
to_cursor.executemany(insert_query, current_data)
sqlite_conn.commit()
sqlite_conn.commit()
Upvotes: 5
Reputation: 1009
You can look at executemany from pyodbc or sqlite. If you can build a list of parameters from your select, you can pass the list to executemany.
Depending on the number of records you plan to insert, performance can be a problem as referenced in this open issue. https://github.com/mkleehammer/pyodbc/issues/120
Upvotes: 1