Confused about tensor dimensions and batch sizes in pytorch

So I'm very new to PyTorch and Neural Networks in general, and I'm having some problems creating a Neural Network that classifies names by gender.
I based this off of the PyTorch tutorial for RNNs that classify names by nationality, but I decided not to go with a recurrent approach... Stop me right here if this was the wrong idea!
However, whenever I try to run an input through the network it tells me:

RuntimeError: matrices expected, got 3D, 2D tensors at /py/conda-bld/pytorch_1493681908901/work/torch/lib/TH/generic/THTensorMath.c:1232

I know this has something to do with how PyTorch always expects there to be a batch size or something, and I have my tensor set up that way, but you can probably tell by this point that I have no idea what I'm talking about. Here's my code:

from future import unicode_literals, print_function, division
from io import open
import glob
import unicodedata
import string
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import random
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

"""------GLOBAL VARIABLES------"""

all_letters = string.ascii_letters + " .,;'"
num_letters = len(all_letters)
all_names = {}
genders = ["Female", "Male"]

"""-------DATA EXTRACTION------"""

def findFiles(path):
    return glob.glob(path)

def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

for file in findFiles("/home/andrew/PyCharm/PycharmProjects/CantStop/data/names/*.txt"):
    gender = file.split("/")[-1].split(".")[0]
    names = readLines(file)
    all_names[gender] = names

"""-----DATA INTERPRETATION-----"""

def nameToTensor(name):
    tensor = torch.zeros(len(name), 1, num_letters)
    for index, letter in enumerate(name):
        tensor[index][0][all_letters.find(letter)] = 1
    return tensor

def outputToGender(output):
    gender, gender_index = output.data.topk(1)
    if gender_index[0][0] == 0:
        return "Female"
    return "Male"

"""------NETWORK SETUP------"""

class Net(nn.Module):
    def __init__(self, input_size, output_size):
        super(Net, self).__init__()
        #Layer 1
        self.Lin1 = nn.Linear(input_size, int(input_size/2))
        self.ReLu1 = nn.ReLU()
        self.Batch1 = nn.BatchNorm1d(int(input_size/2))
        #Layer 2
        self.Lin2 = nn.Linear(int(input_size/2), output_size)
        self.ReLu2 = nn.ReLU()
        self.Batch2 = nn.BatchNorm1d(output_size)
        self.softMax = nn.LogSoftmax()

    def forward(self, input):
        output1 = self.Batch1(self.ReLu1(self.Lin1(input)))
        output2 = self.softMax(self.Batch2(self.ReLu2(self.Lin2(output1))))
        return output2

NN = Net(num_letters, 2)

"""------TRAINING------"""

def getRandomTrainingEx():
    gender = genders[random.randint(0, 1)]
    name = all_names[gender][random.randint(0, len(all_names[gender])-1)]
    gender_tensor = Variable(torch.LongTensor([genders.index(gender)]))
    name_tensor = Variable(nameToTensor(name))
    return gender_tensor, name_tensor, gender

def train(input, target):
    loss_func = nn.NLLLoss()

    optimizer = optim.SGD(NN.parameters(), lr=0.0001, momentum=0.9)

    optimizer.zero_grad()

    output = NN(input)

    loss = loss_func(output, target)
    loss.backward()
    optimizer.step()

    return output, loss

all_losses = []
current_loss = 0

for i in range(100000):
    gender_tensor, name_tensor, gender = getRandomTrainingEx()
    output, loss = train(name_tensor, gender_tensor)
    current_loss += loss

    if i%1000 == 0:
        print("Guess: %s, Correct: %s, Loss: %s" % (outputToGender(output), gender, loss.data[0]))

    if i%100 == 0:
        all_losses.append(current_loss/10)
        current_loss = 0

# plt.figure()
# plt.plot(all_losses)
# plt.show()

Please help a newbie out!

Upvotes: 6

Answers (2)

DarkCygnus

Reputation: 7838

First, regarding your error, as other answers say and also your exception, it is probably because your input parameters are not shaped correctly. You could try debugging to isolate the line that gives the error, and then edit your question with it, so we know for sure what is causing the problem and correct it (without full stack trace it is harder to know what is the problem).

Now, you are trying to implement a Neural Network that classifies names by gender, as you indicated. We can see that this task will require to somehow input a name (which have different sizes) and output a gender (a binary variable: male, female). However, Neural Networks in general are built and trained to classify inputs (vectors) of fixed size of features, like they mention in the pytorch docs:

Parameters: input_size – The number of expected features in the input x

...

Looking at the tutorial you mentioned, they do consider this situation, as in their case the input for the network is a single letter transformed to a "one-hot vector", as they indicate:

To run a step of this network we need to pass an input (in our case, the Tensor for the current letter) and a previous hidden state (which we initialize as zeros at first). We’ll get back the output (probability of each language) and a next hidden state (which we keep for the next step).

And even give an example of it (remember tensors are Variables in pytorch):

input = Variable(letterToTensor('A'))
hidden = Variable(torch.zeros(1, n_hidden))
output, next_hidden = rnn(input, hidden)

Note: That being said, there are some other things you can do to adapt your implementation to variable-sized inputs. Based on my experience and also complemented by this and this other great questions, you could:

Preprocess your data to extract new features and transform it to fixed-size inputs. This is usually the most used approach but requires experience and patience to get good features. Some techniques used are PCA (Principal Component Analysis) and LDA (Latent Dirichlet Allocation)

For example, you could extract from your data features like: the length of the name, the number of letter a's in the name (female names tend to have more a's), the number of letter e's in the name (the same but with male names maybe?), and others... so you can generate new features like [name_length, a_found, e_found, ...]. Then you could follow a regular approach with you new fixed-size vectors. Do note that those features have to be meaningful; these ones I just came up for example (although they could work).
Split your input names into fixed-sized substring (or iterate them with a sliding window), so then you can classify them with a network designed for that size and combine the outputs in an ensemble way to obtain the final classification.

Upvotes: 0

Haha TTpro

Reputation: 5546

Debugging your bug out:

Pycharm is a helpful python debugger that let you set breakpoint and views dimension of your tensor.
For easier debug, do not stack forward thing up like that

output1 = self.Batch1(self.ReLu1(self.Lin1(input)))

Instead,

h1 = self.ReLu1(self.Lin1(input))
h2 = self.Batch1(h1)

For the stacktrace, Pytorch also provide Pythonic error stacktrack. I believe that before

RuntimeError: matrices expected, got 3D, 2D tensors at /py/conda-bld/pytorch_1493681908901/work/torch/lib/TH/generic/THTensorMath.c:1232

There are some python error stacktrace that point right into your code. For easier debug, as I said, don't stack forward.

You use Pycharm to create break point before crash point. In debugger watcher Then use Variable(torch.rand(dim1, dim2)) to test out forward pass input, output dimension, and if a dimension is incorrect. Comparing with dimension of input. Call input.size() in debugger watcher.

For example, self.ReLu1(self.Lin1(Variable(torch.rand(10, 20)))).size() . If it show read text (error), then the input dimension is incorrect. Else, it show the size of the output.

Read the docs

In Pytorch Docs, it specify input/output dimension. It also have a example code snip

>>> rnn = nn.RNN(10, 20, 2)
>>> input = Variable(torch.randn(5, 3, 10))
>>> h0 = Variable(torch.randn(2, 3, 20))
>>> output, hn = rnn(input, h0)

You may use the code snip in PyCharm Debugger to explore dimension of input, output of specific layer of your interest (RNN, Linear, BatchNorm1d).

Upvotes: 2

Confused about tensor dimensions and batch sizes in pytorch

Answers (2)

Related Questions