Zal78
Zal78

Reputation: 87

big data in pytorch, help for tuning steps

I've previously splitted my bigdata:

# X_train.shape : 4M samples x 2K features
# X_test.shape : 2M samples x 2K features

I've prepared the dataloaders

target = torch.tensor(y_train.to_numpy())
features = torch.tensor(X_train.values)
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10000, shuffle=True) 

testtarget = torch.tensor(y_test.to_numpy())
testfeatures = torch.tensor(X_test.values)
test = data_utils.TensorDataset(testfeatures, testtarget)
validation_generator = data_utils.DataLoader(test, batch_size=20000, shuffle=True) 

I copied from an online course this example for a network (no idea if other model are better)

base_elastic_model = ElasticNet()
param_grid = {'alpha':[0.1,1,5,10,50,100],
              'l1_ratio':[.1, .5, .7, .9, .95, .99, 1]}
grid_model = GridSearchCV(estimator=base_elastic_model,
                          param_grid=param_grid,
                          scoring='neg_mean_squared_error',
                          cv=5,
                          verbose=0)

I've built this fitting

for epoch in range(1):
    # Training
    cont=0
    total = 0
    correct = 0
    for local_batch, local_labels in train_loader:
        cont+=1
        with torch.set_grad_enabled(True):
            grid_model.fit(local_batch,local_labels)
        with torch.set_grad_enabled(False):
            predicted = grid_model.predict(local_batch)
            total += len(local_labels)
            correct += ((1*(predicted>.5)) == np.array(local_labels)).sum()
        #print stats

    # Validation
    total = 0
    correct = 0

    with torch.set_grad_enabled(False):
        for local_batch, local_labels in validation_generator:
            predicted = grid_model.predict(local_batch)
            total += len(local_labels)
            correct += ((1*(predicted>.5)) == np.array(local_labels)).sum()
            #print stats

Maybe my grandchildren will have the results for 1 epoch!

I need some advises:

  1. how/where (in the code) can I use quickly less data for a first tuning?
  2. some advise for the steps to have a result in the 2022?
  3. because I've added "with torch.set_grad_enabled(False):" for stats printing, have I to add (as done) "with torch.set_grad_enabled(True):" ?
  4. I have got a GPU (useful without images??). I've the function "get_device()". Where have I to put ".to(get_device())" to use CUDA?
  5. I'm learning putting together pieces of information, do you have general advising for my exercise?

Upvotes: 2

Views: 156

Answers (1)

gerda die gandalfziege
gerda die gandalfziege

Reputation: 762

  1. To shorten the training process by simply stopping the training for loop after a certain number like so.

    for local_batch, local_labels in train_loader:
    
       cont+=1
       if cont== number_u_want_to_stop:
          break #Breaks out of the for Loop and continues with the rest.
    
  2. Always use your GPU for training and "inferencing" aka (using a model to make predictions) bs it is more than 20 faster than even the best CPU.

  3. No you don't have to make it true again. That's the main point of using the "with" syntax so after the code that is in the with the block is finished the properties will just dissolve into air :). So u can delete this line with a torch.set_grad_enabled(False):

  4. Like I said in the 2nd point use your GPU for all your projects but keep in mind u will have to use a graphics card with at least 4GB to train even little models.

    here the install cmd for using the GPU on windows:

    pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio===0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

    and here is the one for Linux

    pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

and here is a link to the PyTorch doc that explains to you how to use the GPU in PyTorch

  1. A very nice starter project that probably everyone has done when he started with machine learning especially those who want to use computer vision. The Implementation of the image classification using the MNIST Dataset. There are many great tutorials out there. So at first, it will be very overwhelming with all those new words but I will promise it will get better when you start to speak the same language as the guys writing those tutorials. So first follow the tutorial and if u don't understand any word just google it by itself and work through it in little pieces bc otherwise, it will be very hard to comprehend. After u gained some basic knowledge u can start to build your own little projects. Start with something little. So keep grinding :)

Upvotes: 2

Related Questions