Taewoo.Lim
Taewoo.Lim

Reputation: 223

prediction based on updating data

I'm currently working on an online advertisement optimzier project. Let's assume that only thing I can change is CPC(cost per click). I don't have much data, as the data is updated only once a day. I want to get the prediction of net_income by CPC and want to let the program to suggest the best CPC value to maximize the net_income for tomorrow, based on the data that updates every day.

    cpc   margin
0   440 -95224.0
1   840 -81620.0
2   530 -57496.0
3   590 -47287.0
4   560 -45681.0
5   590 -52766.0
6   500 -60852.0
7   650 -59653.0
8   480 -48905.0
9   620 -56496.0
10  680 -53614.0
11  590 -44440.0
12  460 -34066.0
13  720 -31086.0
14  590 -23177.0
15  680 -12803.0
16  760 -10625.0
17  590 -20548.0
18  800 -15136.0
19  650 -12804.0
20  420 -63435.0
21  400  -7566.0
22  400  21136.0
23  400 -58585.0
24  400 -14166.0
25  420 -23065.0
26  400 -28533.0
27  380 -14454.0
28  400 -50819.0
29  380 -26356.0
30  400 -26322.0
31  380 -19107.0
32  400 -28270.0
33  380 -88439.0
34  360 -32207.0
35  340 -27632.0
36  340 -18050.0
37  340 -71574.0
38  340 -18050.0
39  320 -20735.0
40  300 -17984.0
41  290  -9426.0
42  280 -16555.0
43  290   2961.0

For instance, say the above data is df.

I tried use sklearn and LogisticRegression to get the prediction:

import pandas as pd
from sklearn import datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

model = LinearRegression()
model.fit(df['cpc'], df['margin'])
prediction = model.predict([[300]])
print(prediction[0])

margin is net_income, btw.

So by doing this, I thought I might get the prediction based on the data when CPC is 300, but it returned an error saying:

ValueError: Expected 2D array, got 1D array instead:
array=[440 840 530 590 560 590 500 650 480 620 680 590 460 720 590 680 760 590
 800 650 420 400 400 400 400 420 400 380 400 380 400 380 400 380 360 340
 340 340 340 320 300 290 280 290].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I've been looking for some examples using linear regression models or logistics regression models, but they all use a 2-d array for input, which doesn't fit my needs. I only have one factor that I can change, and the result is simply the net_income(or margin).

How would I use sklearn on my project? Or is there maybe another better way to solve the problem?

I'm pretty new to programming and have no knowledge of math and statistics which makes it harder for me to understand or get keywords to study... please guide me on this.

---------------------------------updated------------------------------------- Allright, let me give you another df

    cpc    margin
0   440  -35224.0
1   340  -11574.0
2   380  -68439.0
3   420  -23435.0
4   840  -81620.0
5   400  -38585.0
6   530  -37496.0
7   590   -7287.0
8   560   -5681.0
9   590  -32766.0
10  500  -60852.0
11  400  -30819.0
12  650  -59653.0
13  480  -28905.0
14  620  -56496.0
15  680  -53614.0
16  590  -44440.0
17  460  -14066.0
18  420   16935.0
19  360  -12207.0
20  400   -8533.0
21  400   -6322.0
22  400   25834.0
23  720  -31086.0
24  400  121136.0
25  400  -28270.0
26  340    1950.0
27  340    1950.0
28  300    2016.0
29  340  -27632.0
30  400   32434.0
31  380  -26356.0
32  590  -23177.0
33  680    7197.0
34  320  -20735.0
35  760    9375.0
36  590  -20548.0
37  290   10574.0
38  380  -19107.0
39  290   42961.0
40  280  -16555.0
41  800  -15136.0
42  380  -14454.0
43  650  -12804.0

Thanks to your answers, I could go further as below. after I could run my code without error, I thought by looping the input, I would be able to get the optimal cpc value.

import pandas as pd
from sklearn import datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
df = pd.DataFrame(final_db)
model = LogisticRegression()
x = df[['cpc']]
model.fit(x, df['margin'])
previous_prediction = -99999999999999
df_prediction = []
for i in list(range(10, 1000, 10)):
    prediction = model.predict([[i]])
    df_prediction.append({'cpc':i, 'margin' : prediction})
    if prediction > previous_prediction:
        previous_prediction = prediction
        previous_i = i

and the result was as below enter image description here

which isn't very satisfying. based on the data I have, is there any better model to use? To achieve my goal, any other suggestions?

Upvotes: 0

Views: 89

Answers (1)

Dev Khadka
Dev Khadka

Reputation: 5451

I guess it is complaining about this line
model.fit(df['cpc'], df['margin'])

where first parameter should be two dimensional array. You can used array indexing of DataFrame
df[['cpc']]
to get DataFrame instead of series which will fix the issue

Upvotes: 1

Related Questions