Data Preparation: Alternatives to constructing a table using sql resulting in many columns

Question

I'm trying to fetch data from our companies dwh-server and use some machine learning algorithms on those data.

(Un)-fortunately the data I want to aggregate reside on different tables. E.g. I have demographic customer info, purchase behaviour and received advertisements.

Furhtermore, advertisements are classified into two classes, as well as products a custome can purchase. I need to construct a mapping for each customer on advertisement to purchases (depending on time after advert is delivered). Hence, I know if the advertisement is succesful or not.

Additionally, to build a machine learning algorithm, I need to assign purchases and recevied adverts BEFORE I send out a new advertisement and summarize those data in aggregated form. Both, recevied adverts and purchases are subject to seasonality.

Hence, it might be wise to aggregate data (like net profit, number of purchases, etc.) on seasons, previous seasons, the seasone before the previous seasons and so on. Furthermore, recall that products are classified into two categories. And the number of aggregates/columns is doubled.

I think you can sense where this is heading:

Is there an efficient alternative to fetch data from sql for machine learning avoiding lots of columns?

Of course, I could postpone the aggregation and combine step to R or python on my local machine, but typically the sql server should be much faster.

Currently, my table shall contain roughly 400 columns and queries haven't finished afert 80 minutes on an sql server.

Maybe I should add the following: In the final aggregation step I'm using many statements of the form:

SUM(IIF(PRODUCTTYPE = 1, NET, 0)) AS TYPE1NET
SUM(IIF(PRODUCTTYPE = 1 AND SEASON = 'WIN', NET, 0)) AS TYPE1NETWIN

I don't see a place where I could calculate this earlier on.

Data Preparation: Alternatives to constructing a table using sql resulting in many columns

Answers (1)

Related Questions