paradocslover
paradocslover

Reputation: 3294

Looping over test and train set simultaneously

I was going through a tutorial on time series. In that I found something related to this:

for i in (train,test):
    print(i)

Now, my expectation was that we are iterating over a tuple of train and test. But surprisingly, I found that it processed all of the train data first followed by the test data. What is actually happening behind the scenes?

EDIT : Train and test are panda dataframes. Assume the code is

for i in (a,b):
    print(i)

Then the output

In case of lists:

[1,2,3]
[2,4]

In case of dataframes:

   0
0  1
1  2
2  3
   0
0  2
1  4

Upvotes: 1

Views: 486

Answers (1)

André C. Andersen
André C. Andersen

Reputation: 9385

In python you can create a tuple (i.e., an immutable list) by doing (1, 2, 3). This is similar to how you can create a list [1, 2, 3]. What you are doing in the for-loop is creating a tuple of length two, with entries train and test, then looping over them.

The following prints 1, 2, and 3:

my_tuple = (1, 2, 3)
for i in my_tuple:
    print(i)

... same as this:

for i in (1, 2, 3):
    print(i)

The reason your tutorial is doing this as a loop is simply that the operations need to do prediction on train and test are identical.

An example which is probably closer to what your tutorial is doing is the following:

train = load_train_data()
model = train_model(train)
test = load_test_data()
for dataset in (train, test):
    predictions = model.predict(dataset)
    print(predictions)

Which is just the same as:

train = load_train_data()
model = train_model(train)
test = load_test_data()
train_predictions = model.predict(train)
print(train_predictions)
test_predictions = model.predict(test)
print(test_predictions)

Upvotes: 2

Related Questions