Same code with different indentation gives different results?

Question

Below are codes that differ only in indentation. They give different outputs. The first one outputs

array([[7.68717818e+11, 6.97681947e+11, 7.63307666e+11, 6.47349007e+11,
        5.99065727e+11],
       [7.66709788e+11, 6.99939453e+11, 7.64013608e+11, 6.45261906e+11,
        5.99177884e+11],
       [2.94202392e+11, 4.16042886e+12, 3.69055160e+11, 3.08724683e+11,
        3.71083543e+11],
       ...,
       [2.32972294e+11, 2.25518151e+11, 2.16985500e+11, 1.52392619e+11,
        1.87686750e+11],
       [1.31495500e+11, 4.03441481e+11, 1.66796570e+11, 7.37775506e+10,
        1.55474795e+11],
       [1.26216951e+11, 5.60385882e+11, 1.49446146e+11, 7.23769941e+10,
        1.45692856e+11]])

The second one gives

array([[7.66709788e+11, 6.99939453e+11, 7.64013608e+11, 6.45261906e+11,
        5.99177884e+11],
       [2.94202392e+11, 4.16042886e+12, 3.69055160e+11, 3.08724683e+11,
        3.71083543e+11],
       [2.92378752e+11, 2.93673677e+12, 3.58775694e+11, 3.06625155e+11,
        3.56276475e+11],
       ...,
       [1.31495500e+11, 4.03441481e+11, 1.66796570e+11, 7.37775506e+10,
        1.55474795e+11],
       [1.26216951e+11, 5.60385882e+11, 1.49446146e+11, 7.23769941e+10,
        1.45692856e+11],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00]])

Why do they give different results? Is it some general fact about how python works? I thought indentation is proper in both cases, but apparently there are several types of "proper indentation" that can give different results?

mses = np.zeros((len(all_models), 5))

kfold = KFold(5, 
              random_state = 614,
              shuffle = True)
j = 0
for train_index, test_index in kfold.split(cars_train):
    cars_tt = cars_train.iloc[train_index]
    cars_ho = cars_train.iloc[test_index]
    i = 0
    for model in all_models:
        if model == "baseline":
          pred = np.power(10,cars_tt.log_sell.mean()*np.ones(len(cars_ho)))
          mses[i, j] = mean_squared_error(cars_ho.selling_price, pred)
        else:
          if len(model) == 1:
            reg = LinearRegression(copy_X = True)                
            reg.fit(cars_tt[model].values.reshape(-1,1), cars_tt.log_sell.values)
            pred = np.power(10,reg.predict(cars_ho[model].values))
            mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
          else:
            reg = LinearRegression(copy_X = True)
            reg.fit(cars_tt[model].values, cars_tt.log_sell.values)
            pred = np.power(10,reg.predict(cars_ho[model].values))
            mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
        i = i + 1
    j = j + 1
mses

mses = np.zeros((len(all_models), 5))

kfold = KFold(5, 
              random_state = 614,
              shuffle = True)
j = 0
for train_index, test_index in kfold.split(cars_train):
    cars_tt = cars_train.iloc[train_index]
    cars_ho = cars_train.iloc[test_index]
    i = 0
    for model in all_models:
      if model == "baseline":
          pred = np.power(10,cars_tt.log_sell.mean()*np.ones(len(cars_ho)))
          mses[i, j] = mean_squared_error(cars_ho.selling_price, pred)
      else:
        if len(model) == 1:
          reg = LinearRegression(copy_X = True)                
          reg.fit(cars_tt[model].values.reshape(-1,1), cars_tt.log_sell.values)
          pred = np.power(10,reg.predict(cars_ho[model].values))
          mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
        else:
          reg = LinearRegression(copy_X = True)
          reg.fit(cars_tt[model].values, cars_tt.log_sell.values)
          pred = np.power(10,reg.predict(cars_ho[model].values))
          mses[i,j] = mean_squared_error(cars_ho.selling_price, pred)
        i = i + 1
    j = j + 1
mses

John Kugelman · Accepted Answer

In the first snippet i = i + 1 is indented to line up with the outer else: block, so it comes after it.

if model == "baseline":
  ...
else:
  if len(model) == 1:
    ...
  else:
    ...
i = i + 1

In the second snippet it's indented such that it's inside the outer else: block.

if model == "baseline":
  ...
else:
  if len(model) == 1:
    ...
  else:
    ...
  i = i + 1

This difference affects when and how many times i = i + 1 is executed.

If Python had curly braces it would be the difference between:

if model == "baseline" {
  ...
}
else {
  if len(model) == 1 {
    ...
  }
  else {
    ...
  }
}
i = i + 1

and:

if model == "baseline" {
  ...
}
else {
  if len(model) == 1 {
    ...
  }
  else {
    ...
  }
  i = i + 1
}

Same code with different indentation gives different results?

Answers (1)

Related Questions