Reputation: 223
I'm currently working on an online advertisement optimzier project. Let's assume that only thing I can change is CPC(cost per click). I don't have much data, as the data is updated only once a day. I want to get the prediction of net_income by CPC and want to let the program to suggest the best CPC value to maximize the net_income for tomorrow, based on the data that updates every day.
cpc margin
0 440 -95224.0
1 840 -81620.0
2 530 -57496.0
3 590 -47287.0
4 560 -45681.0
5 590 -52766.0
6 500 -60852.0
7 650 -59653.0
8 480 -48905.0
9 620 -56496.0
10 680 -53614.0
11 590 -44440.0
12 460 -34066.0
13 720 -31086.0
14 590 -23177.0
15 680 -12803.0
16 760 -10625.0
17 590 -20548.0
18 800 -15136.0
19 650 -12804.0
20 420 -63435.0
21 400 -7566.0
22 400 21136.0
23 400 -58585.0
24 400 -14166.0
25 420 -23065.0
26 400 -28533.0
27 380 -14454.0
28 400 -50819.0
29 380 -26356.0
30 400 -26322.0
31 380 -19107.0
32 400 -28270.0
33 380 -88439.0
34 360 -32207.0
35 340 -27632.0
36 340 -18050.0
37 340 -71574.0
38 340 -18050.0
39 320 -20735.0
40 300 -17984.0
41 290 -9426.0
42 280 -16555.0
43 290 2961.0
For instance, say the above data is df
.
I tried use sklearn
and LogisticRegression
to get the prediction:
import pandas as pd
from sklearn import datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
model = LinearRegression()
model.fit(df['cpc'], df['margin'])
prediction = model.predict([[300]])
print(prediction[0])
margin is net_income, btw.
So by doing this, I thought I might get the prediction based on the data when CPC is 300, but it returned an error saying:
ValueError: Expected 2D array, got 1D array instead:
array=[440 840 530 590 560 590 500 650 480 620 680 590 460 720 590 680 760 590
800 650 420 400 400 400 400 420 400 380 400 380 400 380 400 380 360 340
340 340 340 320 300 290 280 290].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I've been looking for some examples using linear regression models or logistics regression models, but they all use a 2-d array for input, which doesn't fit my needs. I only have one factor that I can change, and the result is simply the net_income(or margin).
How would I use sklearn on my project? Or is there maybe another better way to solve the problem?
I'm pretty new to programming and have no knowledge of math and statistics which makes it harder for me to understand or get keywords to study... please guide me on this.
---------------------------------updated------------------------------------- Allright, let me give you another df
cpc margin
0 440 -35224.0
1 340 -11574.0
2 380 -68439.0
3 420 -23435.0
4 840 -81620.0
5 400 -38585.0
6 530 -37496.0
7 590 -7287.0
8 560 -5681.0
9 590 -32766.0
10 500 -60852.0
11 400 -30819.0
12 650 -59653.0
13 480 -28905.0
14 620 -56496.0
15 680 -53614.0
16 590 -44440.0
17 460 -14066.0
18 420 16935.0
19 360 -12207.0
20 400 -8533.0
21 400 -6322.0
22 400 25834.0
23 720 -31086.0
24 400 121136.0
25 400 -28270.0
26 340 1950.0
27 340 1950.0
28 300 2016.0
29 340 -27632.0
30 400 32434.0
31 380 -26356.0
32 590 -23177.0
33 680 7197.0
34 320 -20735.0
35 760 9375.0
36 590 -20548.0
37 290 10574.0
38 380 -19107.0
39 290 42961.0
40 280 -16555.0
41 800 -15136.0
42 380 -14454.0
43 650 -12804.0
Thanks to your answers, I could go further as below. after I could run my code without error, I thought by looping the input, I would be able to get the optimal cpc value.
import pandas as pd
from sklearn import datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
df = pd.DataFrame(final_db)
model = LogisticRegression()
x = df[['cpc']]
model.fit(x, df['margin'])
previous_prediction = -99999999999999
df_prediction = []
for i in list(range(10, 1000, 10)):
prediction = model.predict([[i]])
df_prediction.append({'cpc':i, 'margin' : prediction})
if prediction > previous_prediction:
previous_prediction = prediction
previous_i = i
which isn't very satisfying. based on the data I have, is there any better model to use? To achieve my goal, any other suggestions?
Upvotes: 0
Views: 89
Reputation: 5451
I guess it is complaining about this line
model.fit(df['cpc'], df['margin'])
where first parameter should be two dimensional array. You can used array indexing of DataFrame
df[['cpc']]
to get DataFrame instead of series which will fix the issue
Upvotes: 1