Reputation: 624
Imagine a table primary keyed by date and name.
|Date|Name|Value|
|D1 | A| 2|
|D1 | B| 3|
|D2 | A| 3|
|D2 | C| 1|
How would one write a SQLAlchemy query yielding (date, value) pairs where the values are all of the values for all of the names on that date? The length of the value tuple would be len(distinct(Name))
and missing values would contain some sentinel value (e.g. None).
i.e.
[
(D1, 2, 3, None),
(D2, 3, None, 1),
...
]
I'm not asking for a full answer here in order to get me to approve. If you can tell me what concepts to search for/read in the documentation (or really, point me to anything useful or related), I'll (try to) figure it out myself and post an update.
[No pandas solutions please]
The easy answer is to load the query in long form and do the pivoting in numpy using numpy.searchsorted, but if the submitted answer uses this, I would be interested in some performance diagnostics to compare with the simple solution.
Upvotes: 1
Views: 7467
Reputation: 77032
I would recommend not to do it in on the SQL Server
side, unless you have a huge amount of data (and even in this case I am not sure it has significant positive performance impact).
Instead, use different tools where they excel:
MS-SQL
(or other RDBMS) to store dataSQLAlchemy
to insert/query that datanumpy
routines or pandas.pivot
to pivot the dataThe code below is self-contained, running (with sqlite
) and shows how to do get to the final result you desire:
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import pandas as pd
_db_uri = 'sqlite:///:memory:'
engine = create_engine(_db_uri, echo=True)
Session = sessionmaker(bind=engine)
Base = declarative_base()
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
date = Column(String)
name = Column(String)
value = Column(Integer)
def _test():
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
session = Session()
def _add_test_data():
rows = [
MyTable(date="D1", name="A", value=2),
MyTable(date="D1", name="B", value=3),
MyTable(date="D2", name="A", value=3),
MyTable(date="D2", name="C", value=1),
]
session.add_all(rows)
session.commit()
# create test data
_add_test_data()
# use `sa` to query data from the database
# q = session.query(MyTable) # all columns
q = session.query(MyTable.date, MyTable.name, MyTable.value) # explicit
# read data into pandas directly from the query `q`
df = pd.read_sql(q.statement, q.session.bind)
print(df)
# pivot the results
df_pivot = df.pivot(index="date", columns="name", values="value")
print(df_pivot)
if __name__ == '__main__':
_test()
Upvotes: 9