Reputation: 3135
Pandas has a great feature, where you can write your dataframe to a table in SQL.
df.to_sql(con=cnx, name='some_table_name', if_exists='replace', flavor='mysql', index=False)
Is there a way to make a temporary table this way?
There is nothing in the documentation as far as I can tell.
Upvotes: 12
Views: 17236
Reputation: 4699
This was a quick and easy workaround for me.
Simply apply a RegEx to the generated SQL to add in whatever statements you want.
import io
import pandas as pd
# Get the SQL that would be generated by the create table statement
create_table_sql = pd.io.sql.get_schema(df, tmp_table_name)
# Replace the `CREATE TABLE` part of the generated statement with
# whatever you need.
create_tmp_table_sql = re.sub(
"^(CREATE TABLE)?",
"CREATE TEMP TABLE",
create_table_sql
)
Then, you can use it like this:
# Write to the database in a transaction (psycopg2)
with conn.cursor() as cur:
cur.execute(create_tmp_table_sql)
output = io.StringIO()
df.to_csv(output, sep="\t", header=False, index=False, na_rep="NULL")
output.seek(0)
cur.copy_from(output, tmp_table_name, null="NULL")
Credit to Aseem for a fast way to write to Postgres.
Upvotes: 7
Reputation: 1922
This may be a bit hacky and it doesn't technically create a temporary table, it just acts like one, but you could create use the @contextmanager
decorator from contextlib
to create the table upon opening the context and drop it upon close. Could look something like:
from contextlib import contextmanager
import numpy as np
import sqlalchemy as sqla
import pandas as pd
@contextmanager
def temp_table(frame, tbl, eng, *args, **kwargs):
frame.to_sql(tbl, eng, *args, **kwargs)
yield
eng.execute('DROP TABLE {}'.format(tbl))
df = pd.DataFrame(np.random.randint(21, size=(10, 10)))
cnx = sqla.create_engine(conn_string)
with temp_table(df, 'some_table_name', cnx, if_exists='replace', flavor='mysql', index=False):
# do stuff with "some_table_name"
I tested it using Teradata and it works fine. I don't have a MySQL laying around that I can test it out on, but as long as DROP
statements work in MySQL, it should work as intended.
Upvotes: 1
Reputation: 474031
The DataFrame.to_sql()
uses the built into pandas pandas.io.sql
package, which itself relies on the SQLAlchemy as a database abstraction layer. In order to create a "temporary" table in SQLAlchemy ORM, you need to supply a prefix:
t = Table(
't', metadata,
Column('id', Integer, primary_key=True),
# ...
prefixes=['TEMPORARY'],
)
From what I see, pandas.io.sql
does not allow you to specify the prefixes
or easily change the way tables are created.
One way to approach this problem would be to create the temporary table beforehand and use to_sql()
with if_exists="append"
(all using the same database connection).
Here is also what I've tried to do: override the pandas.io.sql.SQLTable
's _create_table_setup()
method and pass the prefixes
to the Table
constructor. For some reason, the table was still created non-temporary. Not sure if it would help, but here is the code I was using: gist. This is kind of hacky, but I hope it would at least serve as an example code to get you started on this approach.
Upvotes: 21