KeyPi
KeyPi

Reputation: 526

understanding FastText multilingual

I am working with this modified version of FastText (fastText_multilingual) that will let me align words in two languages.

I am trying to understand their fasttext.py and especially the Fast Vector class. In the example file align_your_own.ipynbthe authors show how to measure similarity between two words. I would like to iterate the process for the whole set of words, instead of measuring similarity every time for a single word. To do this I need to understand how to access to these FastVector objects. That's why I am trying to understand the Fast vector class.

I am stuck here:

 def __init__(self, vector_file='', transform=None):
    """Read in word vectors in fasttext format"""
    self.word2id = {}

    # Captures word order, for export() and translate methods
    self.id2word = []

    print('reading word vectors from %s' % vector_file)
    with open(vector_file, 'r') as f:
        (self.n_words, self.n_dim) = \
            (int(x) for x in f.readline().rstrip('\n').split(' '))
        self.embed = np.zeros((self.n_words, self.n_dim))
        for i, line in enumerate(f):
            elems = line.rstrip('\n').split(' ')
            self.word2id[elems[0]] = i
            self.embed[i] = elems[1:self.n_dim+1]
            self.id2word.append(elems[0])

I have never created a class in python, so this make things more difficult for me. These are the lines that I can't understand in depth:

 1. (self.n_words, self.n_dim) = \
 2. self.word2id = {}, self.id2word = [], 
 3. self.embed = np.zeros((self.n_words, self.n_dim))

These are my questions:

Upvotes: 1

Views: 2252

Answers (1)

A backslash at the end of a line tells Python to extend the current logical line over across to the next physical line. In your case, you can read the two lines as a single line:

(self.n_words, self.n_dim) = (int(x) for x in f.readline().rstrip('\n').split(' '))

In Python, a variable is created the moment you first assign a value to it (https://www.w3schools.com/python/python_variables.asp). So, word2id, id2word and embed are not keywords; they are created when a value is assigned to them.

Upvotes: 2

Related Questions