Reputation: 526
I am working with this modified version of FastText (fastText_multilingual) that will let me align words in two languages.
I am trying to understand their fasttext.py and especially the Fast Vector class. In the example file align_your_own.ipynbthe authors show how to measure similarity between two words. I would like to iterate the process for the whole set of words, instead of measuring similarity every time for a single word. To do this I need to understand how to access to these FastVector objects. That's why I am trying to understand the Fast vector class.
I am stuck here:
def __init__(self, vector_file='', transform=None):
"""Read in word vectors in fasttext format"""
self.word2id = {}
# Captures word order, for export() and translate methods
self.id2word = []
print('reading word vectors from %s' % vector_file)
with open(vector_file, 'r') as f:
(self.n_words, self.n_dim) = \
(int(x) for x in f.readline().rstrip('\n').split(' '))
self.embed = np.zeros((self.n_words, self.n_dim))
for i, line in enumerate(f):
elems = line.rstrip('\n').split(' ')
self.word2id[elems[0]] = i
self.embed[i] = elems[1:self.n_dim+1]
self.id2word.append(elems[0])
I have never created a class in python, so this make things more difficult for me. These are the lines that I can't understand in depth:
1. (self.n_words, self.n_dim) = \
2. self.word2id = {}, self.id2word = [],
3. self.embed = np.zeros((self.n_words, self.n_dim))
These are my questions:
Upvotes: 1
Views: 2252
Reputation: 3536
A backslash at the end of a line tells Python to extend the current logical line over across to the next physical line. In your case, you can read the two lines as a single line:
(self.n_words, self.n_dim) = (int(x) for x in f.readline().rstrip('\n').split(' '))
In Python, a variable is created the moment you first assign a value to it (https://www.w3schools.com/python/python_variables.asp). So, word2id, id2word and embed are not keywords; they are created when a value is assigned to them.
Upvotes: 2