Tyler Moore
Tyler Moore

Reputation: 171

SQL - WHERE - variable dynamic filter

I'm trying to write a WHERE clause that will filter data differently for respective entities, based on time frames that are unique to each entity, like so:

import pandas as pd

product_dates = [
        {'name': 'product_1', 'start_date': '2024-01-01', 'end_date': '2024-05-01'},
        {'name': 'product_2', 'start_date': '2024-03-05', 'end_date': '2024-04-26'},
        {'name': 'product_3', 'start_date': '2024-02-09', 'end_date': '2024-11-08'}
]

df_product = pd.DataFrame(product_dates)

What I'd like to see for the output is the number of sales from each product that are between the given dates, like so:

output_data = [
        {'name': 'product_1', 'time_frame': '2024-01-01 to 2024-05-01', 'sales': 204},
        {'name': 'product_2', 'time_frame': '2024-03-05 to 2024-04-26', 'sales': 1890},
        {'name': 'product_3', 'time_frame': '2024-02-09 to 2024-11-08', 'sales': 766}
]


df_output = pd.DataFrame(output_data)

And to get that, there's a table of product sales structured like this:

sales_data = [
        {'name': 'product_1', 'date': '2024-01-01', 'sales': 1},
        {'name': 'product_2', 'date': '2024-01-01', 'sales': 3},
        {'name': 'product_3', 'date': '2024-01-01', 'sales': 4},
    ...
]


df_sales = pd.DataFrame(sales_data)

Theoretically the SQL clause would be:

SELECT name, time_frame, SUM(quantity) AS sales
FROM sales_data
WHERE date BETWEEN start_date AND end_date
GROUP BY ALL

But I'm not sure how to have the WHERE clause reflect the fact that the dates are variable by product.

Any tips? Thank you!

Upvotes: 1

Views: 64

Answers (2)

samhita
samhita

Reputation: 3505

I am basing the logic on @Dale's comment,I am not too sure whether expectation is to have answer in pandas or sql as your expected output is in pandas but I guess there is already an answer for SQL, so I added for pandas.

df_product = pd.DataFrame(product_dates)
df_sales = pd.DataFrame(sales_data)


df_product['start_date'] = pd.to_datetime(df_product['start_date'])
df_product['end_date'] = pd.to_datetime(df_product['end_date'])
df_sales['date'] = pd.to_datetime(df_sales['date'])

def filter_sales(row):
    product_name = row['name']
    start_date = row['start_date']
    end_date = row['end_date']
    
    filtered_sales = df_sales[(df_sales['name'] == product_name) &
                              (df_sales['date'] >= start_date) &
                              (df_sales['date'] <= end_date)]
    
    total_sales = filtered_sales['sales'].sum()
    return total_sales

df_product['sales'] = df_product.apply(filter_sales, axis=1)
df_product['time_frame'] = df_product.apply(lambda row: f"{row['start_date'].date()} to {row['end_date'].date()}", axis=1)
df_output = df_product[['name', 'time_frame', 'sales']]

print(df_output)

which for the sample data

product_dates = [
    {'name': 'product_1', 'start_date': '2024-01-01', 'end_date': '2024-05-01'},
    {'name': 'product_2', 'start_date': '2024-03-05', 'end_date': '2024-04-26'},
    {'name': 'product_3', 'start_date': '2024-02-09', 'end_date': '2024-11-08'}
]

sales_data = [
    {'name': 'product_1', 'date': '2024-01-01', 'sales': 1},
    {'name': 'product_1', 'date': '2024-02-01', 'sales': 3},
    {'name': 'product_1', 'date': '2024-06-01', 'sales': 4},
    {'name': 'product_2', 'date': '2024-03-06', 'sales': 2},
    {'name': 'product_2', 'date': '2024-04-25', 'sales': 3},
    {'name': 'product_3', 'date': '2024-02-10', 'sales': 5},
    {'name': 'product_3', 'date': '2024-11-09', 'sales': 6}
]

generates

        name                time_frame  sales
0  product_1  2024-01-01 to 2024-05-01      4
1  product_2  2024-03-05 to 2024-04-26      5
2  product_3  2024-02-09 to 2024-11-08      5

Upvotes: 1

codeKracken
codeKracken

Reputation: 302

As per Dale K.s comment, In SQL you can use a JOIN for the sales_data and product_dates, this should create a link that you can use to filter the sales to a start date and end date then use SUM to add up the amount sold in that date range.

It could look something like this:

SELECT 
    s.name, 
    CONCAT(p.start_date, ' to ', p.end_date) AS time_frame, 
    SUM(s.sales) AS sales
FROM sales_data s
JOIN product_dates p ON s.name = p.name
WHERE s.date BETWEEN p.start_date AND p.end_date
GROUP BY s.name, p.start_date, p.end_date;

Upvotes: 5

Related Questions