Reputation: 339
I'm trying to perform a SQL join on the the contents of a dataframe with an external table I have in a Postgres Database.
This is what the Dataframe looks like:
>>> df
name author count
0 a b 10
1 c d 5
2 e f 2
I need to join it with a Postgres table that looks like this:
TABLE: blog
title author url
a b w.com
b b x.com
e g y.com
This is what I'm attempting to do, but this doesn't appear to be the right syntax for the query:
>>> sql_join = r"""select b.*, frame.* from ({0}) frame
join blog b
on frame.name = b.title
where frame.owner = b.owner
order by frame.count desc
limit 30;""".format(df)
>>> res = pd.read_sql(sql_join, connection)
I'm not sure how I can use the values in the dataframes within the sql query. Can someone point me in the right direction? Thanks!
Edit: As per my use case, I'm not able to convert the blog table into a dataframe given memory and performance constraints.
Upvotes: 7
Views: 20677
Reputation: 401
I had a similar issue using the SQLAlchemy package and SQLite database connection. This answer did not work for me. Adapting it a bit, I was able to achieve the desired result:
# Import packages
import pandas as pd
# from sqlalchemy import create_engine
# Create sample DataFrame
df = pd.DataFrame(data=[[1, 'A'], [2, 'B'], [3, 'C']], index=None, columns=None, dtype=None)
# Create a list of tuples from the DataFrame
data = str([tuple(x) for x in df.values]).replace('[', '').replace(']', '')
# Create a SQL table from the DataFrame (incl. column names)
inner_table = """
SELECT [column1] AS column_a,
CAST([column2] AS TEXT) AS column_b
FROM (VALUES {})
""".format(data)
# Query
query = """
SELECT table.*
FROM table
INNER JOIN ({}) AS inner_table
ON table.column_a = inner_table.column_a
""".format(inner_table)
# Read SQL table
pd.read_sql_query(sql=query, con=connection, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)
Upvotes: 0
Reputation: 1
I've had similar problems. I found a work-around that allows me to join two different servers where i only have read-only rights. using sqlalchemy insert the pandas dataframe and then join
import sqlalchemy as sa
import pandas as pd
metadata = MetaData()
sql_of_df = sa.Table(
"##df",
metadata,
sa.Column("name", sa.String(x), primary_key=True),
sa.Column("author", sa.String(x), nullable=False),
sa.Columnt("count", sa.Integer),
)
metadata.create_all(engine)
dataframe_dict = df.to_dict(orient='records')
insert_statement = sql_of_df.insert().values(
{
"name":sa.bindparam("name"),
"author":sa.bindparam("author"),
"count":sa.bindparam("count"),
}
)
session.execute(insert_statement, dataframe_dict)
statement=sa.text("SELECT * from blog Inner join ##df on blog.Title = ##df.name")
session.execute(statement)
Upvotes: 0
Reputation: 339
I managed to do this without having to convert the dataframe to a temp table or without reading SQL into a dataframe from the blog table.
For anyone else facing the same issue, this is achieved using a virtual table of sorts.
This is what my final sql query looks like this:
>>> inner_string = "VALUES ('a','b',10), ('c','d',5), ('e','f',2)"
>>> sql_join = r"""SELECT * FROM blog
JOIN ({0}) AS frame(title, owner, count)
ON blog.title = frame.title
WHERE blog.owner = frame.owner
ORDER BY frame.count DESC
LIMIT 30;""".format(inner_string)
>>> res = pd.read_sql(sql_join, connection)
You can use string manipulation to convert all rows in the dataframe into one large string similar to inner_string
.
Upvotes: 10
Reputation: 34046
You should create another dataframe from the Postgres table and then join both dataframes.
You can use read_sql
to create a df from table:
import psycopg2 ## Python connector library to Postgres
import pandas as pd
conn = psycopg2.connect(...) ## Put your DB credentials here
blog_df = pd.read_sql('blog', con=conn)
## This will bring `blog` table's data into blog_df
It should look like this:
In [258]: blog_df
Out[258]:
title author url
0 a b w.com
1 b b x.com
2 e g y.com
Now, you can join df
and blog_df
using merge
like below:
In [261]: pd.merge(df, blog_df, left_on='name', right_on='title')
Out[261]:
name author_x count title author_y url
0 a b 10 a b w.com
1 e f 2 e g y.com
You will get result like above. You can clean it further.
Let me know if this helps.
Upvotes: 2