Reputation: 103
Below are codes that differ only in indentation. They give different outputs. The first one outputs
array([[7.68717818e+11, 6.97681947e+11, 7.63307666e+11, 6.47349007e+11,
5.99065727e+11],
[7.66709788e+11, 6.99939453e+11, 7.64013608e+11, 6.45261906e+11,
5.99177884e+11],
[2.94202392e+11, 4.16042886e+12, 3.69055160e+11, 3.08724683e+11,
3.71083543e+11],
...,
[2.32972294e+11, 2.25518151e+11, 2.16985500e+11, 1.52392619e+11,
1.87686750e+11],
[1.31495500e+11, 4.03441481e+11, 1.66796570e+11, 7.37775506e+10,
1.55474795e+11],
[1.26216951e+11, 5.60385882e+11, 1.49446146e+11, 7.23769941e+10,
1.45692856e+11]])
The second one gives
array([[7.66709788e+11, 6.99939453e+11, 7.64013608e+11, 6.45261906e+11,
5.99177884e+11],
[2.94202392e+11, 4.16042886e+12, 3.69055160e+11, 3.08724683e+11,
3.71083543e+11],
[2.92378752e+11, 2.93673677e+12, 3.58775694e+11, 3.06625155e+11,
3.56276475e+11],
...,
[1.31495500e+11, 4.03441481e+11, 1.66796570e+11, 7.37775506e+10,
1.55474795e+11],
[1.26216951e+11, 5.60385882e+11, 1.49446146e+11, 7.23769941e+10,
1.45692856e+11],
[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00]])
Why do they give different results? Is it some general fact about how python works? I thought indentation is proper in both cases, but apparently there are several types of "proper indentation" that can give different results?
mses = np.zeros((len(all_models), 5))
kfold = KFold(5,
random_state = 614,
shuffle = True)
j = 0
for train_index, test_index in kfold.split(cars_train):
cars_tt = cars_train.iloc[train_index]
cars_ho = cars_train.iloc[test_index]
i = 0
for model in all_models:
if model == "baseline":
pred = np.power(10,cars_tt.log_sell.mean()*np.ones(len(cars_ho)))
mses[i, j] = mean_squared_error(cars_ho.selling_price, pred)
else:
if len(model) == 1:
reg = LinearRegression(copy_X = True)
reg.fit(cars_tt[model].values.reshape(-1,1), cars_tt.log_sell.values)
pred = np.power(10,reg.predict(cars_ho[model].values))
mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
else:
reg = LinearRegression(copy_X = True)
reg.fit(cars_tt[model].values, cars_tt.log_sell.values)
pred = np.power(10,reg.predict(cars_ho[model].values))
mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
i = i + 1
j = j + 1
mses
mses = np.zeros((len(all_models), 5))
kfold = KFold(5,
random_state = 614,
shuffle = True)
j = 0
for train_index, test_index in kfold.split(cars_train):
cars_tt = cars_train.iloc[train_index]
cars_ho = cars_train.iloc[test_index]
i = 0
for model in all_models:
if model == "baseline":
pred = np.power(10,cars_tt.log_sell.mean()*np.ones(len(cars_ho)))
mses[i, j] = mean_squared_error(cars_ho.selling_price, pred)
else:
if len(model) == 1:
reg = LinearRegression(copy_X = True)
reg.fit(cars_tt[model].values.reshape(-1,1), cars_tt.log_sell.values)
pred = np.power(10,reg.predict(cars_ho[model].values))
mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
else:
reg = LinearRegression(copy_X = True)
reg.fit(cars_tt[model].values, cars_tt.log_sell.values)
pred = np.power(10,reg.predict(cars_ho[model].values))
mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
i = i + 1
j = j + 1
mses
Upvotes: 0
Views: 47
Reputation: 361556
In the first snippet i = i + 1
is indented to line up with the outer else:
block, so it comes after it.
if model == "baseline":
...
else:
if len(model) == 1:
...
else:
...
i = i + 1
In the second snippet it's indented such that it's inside the outer else:
block.
if model == "baseline":
...
else:
if len(model) == 1:
...
else:
...
i = i + 1
This difference affects when and how many times i = i + 1
is executed.
If Python had curly braces it would be the difference between:
if model == "baseline" {
...
}
else {
if len(model) == 1 {
...
}
else {
...
}
}
i = i + 1
and:
if model == "baseline" {
...
}
else {
if len(model) == 1 {
...
}
else {
...
}
i = i + 1
}
Upvotes: 2