Jordan
Jordan

Reputation: 305

Join and add data in SQL and sqlalchemy

I am new to MySQL and sqlalchemy.

So Subquery_A gives me ~1,000 results in form:

 [(customer_id, days_a_customer)...]

Subquery_B gives me ~30,000 results in form:

 [(customer_id, transaction_id)...]

Subquery_A represents customer ids with a "days_a_customer" value greater than 30. Subquery_B represents transaction events across all customers.

How can I merge these two subqueries so that I get all rows in subquery_B that have a "customer_id" value in subquery_A as well as the "days_a_customer" from subquery_A i.e:

Subquery_C = [(customer_id, transaction_id, days_a_customer)...]

Where Subquery_C represents all transaction events performed by customers that have been around for > 30 days.

sqlalchemy-specific help is preferred but raw SQL would also be greatly appreciated.

Upvotes: 0

Views: 254

Answers (1)

mtndoe
mtndoe

Reputation: 444

Based on your question I am guessing you want:

  • Customers that are customer for more than 30 days and have a transaction.

All the customers are in one table, the transactions are in a separate table. To satisfy this question we can use an inner-join between the customers and transactions table (only return those records in both tables) and the filter method to filter on the number of days as customer. This method is shown in the example below.

from sqlalchemy import Column, Integer, String, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from random import randint

Base = declarative_base()


## Define sample models


class Customer(Base):
    __tablename__ = 'customer'
    customer_id = Column('id', Integer, primary_key=True)
    nr_days_customer = Column('nr_days_customer', Integer)
    name = Column('name', String(250))
    transaction = relationship('Transaction', back_populates='customer')

    def __init__(self, nr_days_customer, name):
        self.nr_days_customer = nr_days_customer
        self.name = name

    def __repr__(self):
        return '[{customer_id}] - {name} - {nr_days_customer} days'.format(customer_id=self.customer_id,
                                                                           name=self.name,
                                                                           nr_days_customer=self.nr_days_customer)


class Transaction(Base):
    __tablename__ = 'transaction'
    transaction_id = Column('id', Integer, primary_key=True)
    customer_id = Column('customer_id', Integer, ForeignKey('customer.id'))
    name = Column(String(250))
    customer = relationship('Customer', back_populates='transaction')

    def __init__(self, customer_id, name):
        self.customer_id = customer_id
        self.name = name

    def __repr__(self):
        return '[{transaction_id}] {name} - customer: {customer_id}'.format(transaction_id=self.transaction_id,
                                                                            name=self.name,
                                                                            customer_id=self.customer_id)


engine = create_engine('sqlite:///')
session = sessionmaker()
session.configure(bind=engine)
ex_ses = session()
Base.metadata.create_all(engine)

## Create sample data

# http://listofrandomnames.com/ for something else then A,B,...
names = ['Virgil', 'Ina', 'Oleta', 'Suzette', 'Usha', 'Ilda', 'Lorean', 'Cinthia', 'Sheba', 'Waneta', 'Donnette',
         'Minerva', 'Owen', 'Georgia', 'Gertrudis', 'Sherika', 'Terry', 'Deja', 'Regena', 'Carolin']

# Create customers > 30 days with transaction
for customer_nr in range(5):
    ex_ses.add(Customer(name=names[customer_nr].format(customer_nr),
                        nr_days_customer=randint(30, 120)))

# Create customers < 30 days with transaction
for customer_nr in range(5):
    ex_ses.add(Customer(name=names[customer_nr + 5].format(customer_nr),
                        nr_days_customer=randint(0, 29)))

# Create customers > 30 days without transaction
for customer_nr in range(5):
    ex_ses.add(Customer(name=names[customer_nr + 10].format(customer_nr),
                        nr_days_customer=randint(30, 120)))

# Create customers < 30 days without transaction
for customer_nr in range(5):
    ex_ses.add(Customer(name=names[customer_nr + 15].format(customer_nr),
                        nr_days_customer=randint(0, 29)))

# Create transactions
for transaction_nr in range(10):
    ex_ses.add(Transaction(name='Transaction {}'.format(transaction_nr), customer_id=transaction_nr + 1))

ex_ses.commit()

# Queries
print('customers:')
for customer in ex_ses.query(Customer).all():
    print('* ', customer)

'''
customers:
*  [1] - Virgil - 113 days
*  [2] - Ina - 112 days
*  [3] - Oleta - 44 days
*  [4] - Suzette - 61 days
*  [5] - Usha - 82 days
*  [6] - Ilda - 10 days
*  [7] - Lorean - 14 days
*  [8] - Cinthia - 24 days
*  [9] - Sheba - 25 days
*  [10] - Waneta - 11 days
*  [11] - Donnette - 38 days
*  [12] - Minerva - 110 days
*  [13] - Owen - 98 days
*  [14] - Georgia - 68 days
*  [15] - Gertrudis - 36 days
*  [16] - Sherika - 18 days
*  [17] - Terry - 6 days
*  [18] - Deja - 7 days
*  [19] - Regena - 21 days
*  [20] - Carolin - 14 days
'''

print('Transactions:')
for transaction in ex_ses.query(Transaction).all():
    print('* ', transaction)

'''
Transactions:
*  [1] Transaction 0 - customer: 1
*  [2] Transaction 1 - customer: 2
*  [3] Transaction 2 - customer: 3
*  [4] Transaction 3 - customer: 4
*  [5] Transaction 4 - customer: 5
*  [6] Transaction 5 - customer: 6
*  [7] Transaction 6 - customer: 7
*  [8] Transaction 7 - customer: 8
*  [9] Transaction 8 - customer: 9
*  [10] Transaction 9 - customer: 10
'''


'''
-- using inner join only records in both tables are being returned
-- https://www.w3schools.com/sql/sql_join.asp
-- condition on more then 30 days
-- untested SQL:
SELECT customer.customer_id, transaction.transaction_id, customer.days_a_customer from customer
INNER JOIN transaction ON customer.customer_id = transaction.customer_id
WHERE customer.days_a_customer > 30 AND customer.customer_id.
'''

print('Customer - transactions:')
for cust_trans in ex_ses.query(Customer.name, Customer.nr_days_customer,Transaction.transaction_id).join(Transaction).filter(
        Customer.nr_days_customer > 30).all():
    print('* Customer {customer_name} - {nr_days_customer} --> transaction {transaction_id}'.format(
        customer_name = cust_trans[0],nr_days_customer=cust_trans[1],transaction_id=cust_trans[2]))

'''
Customer - transactions:
* Customer Virgil - 113 --> transaction 1
* Customer Ina - 112 --> transaction 2
* Customer Oleta - 44 --> transaction 3
* Customer Suzette - 61 --> transaction 4
* Customer Usha - 82 --> transaction 5
'''

As you can see in the commented text only the 5 customers that are customer for more than 30 days and are in the transaction table are returned by this query.

In the comments a fez links to more resources are provided.Next to these I want to emphasize following additional resources:

I'm not using mysql in the example above. I believe that the provided example can be easily applied to a setup with mysql.

Please comment if there are further questions.

Upvotes: 1

Related Questions