Reputation:
this is my code.it reads reviews from an excel file (rev column) and make a list of list.
xp is like this
["['intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one'],['better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', 'investigator', 'thrust', 'murder', 'investigation', 'invisible'],[ 'man', 'alone', 'tell', 'fun', 'flow', 'decent', 'clip', 'need', 'say', 'sequence', 'comedy', 'gold', 'like', 'scene', 'restaurant', 'excellent', 'costello', 'pretending', 'work', 'ball', 'gym', 'final', 'reel']"]
but when use list for model, it gives me error"TypeError: 'float' object is not iterable".i don't know where is my problem. Thanks.
xp=[]
import gensim
import logging
import pandas as pd
file = r'FileNamelast.xlsx'
df = pd.read_excel(file,sheet_name='FileNamex')
pages = [i for i in range(0,1000)]
for page in pages:
text =df.loc[page,["rev"]]
xp.append(text[0])
model = gensim.models.Word2Vec (xp, size=150, window=10, min_count=2,
workers=10)
model.train(xp,total_examples=len(xp),epochs=10)
this is what i got.TypeError: 'float' object is not iterable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-32-aa34c0e432bf> in <module>()
14
15
---> 16 model = gensim.models.Word2Vec (xp, size=150, window=10, min_count=2, workers=10)
17 model.train(xp,total_examples=len(xp),epochs=10)
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\word2vec.py in __init__(self, sentences, corpus_file, size, alpha, window, min_count, max_vocab_size, sample, seed, workers, min_alpha, sg, hs, negative, ns_exponent, cbow_mean, hashfxn, iter, null_word, trim_rule, sorted_vocab, batch_words, compute_loss, callbacks, max_final_vocab)
765 callbacks=callbacks, batch_words=batch_words, trim_rule=trim_rule, sg=sg, alpha=alpha, window=window,
766 seed=seed, hs=hs, negative=negative, cbow_mean=cbow_mean, min_alpha=min_alpha, compute_loss=compute_loss,
--> 767 fast_version=FAST_VERSION)
768
769 def _do_train_epoch(self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in __init__(self, sentences, corpus_file, workers, vector_size, epochs, callbacks, batch_words, trim_rule, sg, alpha, window, seed, hs, negative, ns_exponent, cbow_mean, min_alpha, compute_loss, fast_version, **kwargs)
757 raise TypeError("You can't pass a generator as the sentences argument. Try an iterator.")
758
--> 759 self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
760 self.train(
761 sentences=sentences, corpus_file=corpus_file, total_examples=self.corpus_count,
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in build_vocab(self, sentences, corpus_file, update, progress_per, keep_raw_vocab, trim_rule, **kwargs)
934 """
935 total_words, corpus_count = self.vocabulary.scan_vocab(
--> 936 sentences=sentences, corpus_file=corpus_file, progress_per=progress_per, trim_rule=trim_rule)
937 self.corpus_count = corpus_count
938 self.corpus_total_words = total_words
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\word2vec.py in scan_vocab(self, sentences, corpus_file, progress_per, workers, trim_rule)
1569 sentences = LineSentence(corpus_file)
1570
-> 1571 total_words, corpus_count = self._scan_vocab(sentences, progress_per, trim_rule)
1572
1573 logger.info(
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\word2vec.py in _scan_vocab(self, sentences, progress_per, trim_rule)
1552 sentence_no, total_words, len(vocab)
1553 )
-> 1554 for word in sentence:
1555 vocab[word] += 1
1556 total_words += len(sentence)
TypeError: 'float' object is not iterable
Upvotes: 0
Views: 2720
Reputation: 54153
The sentences
corpus argument to Word2Vec
should be an iterable sequence of lists-of-word-tokens.
Your reported value for xp
is actually a list with one long string in it:
[
"['intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one'],['better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', 'investigator', 'thrust', 'murder', 'investigation', 'invisible'],[ 'man', 'alone', 'tell', 'fun', 'flow', 'decent', 'clip', 'need', 'say', 'sequence', 'comedy', 'gold', 'like', 'scene', 'restaurant', 'excellent', 'costello', 'pretending', 'work', 'ball', 'gym', 'final', 'reel']"
]
I don't see how this would give the error you've reported, but it's definitely wrong, so should be fixed. You should perhaps print xp
just before you instantiate Word2Vec
to be sure you know what it contains.
A true list, with each item being a list-of-string-tokens, would work. So if xp
were the following that'd be correct:
[
['intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one'],
['better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', 'investigator', 'thrust', 'murder', 'investigation', 'invisible'],
[ 'man', 'alone', 'tell', 'fun', 'flow', 'decent', 'clip', 'need', 'say', 'sequence', 'comedy', 'gold', 'like', 'scene', 'restaurant', 'excellent', 'costello', 'pretending', 'work', 'ball', 'gym', 'final', 'reel']
]
Note, however:
Word2Vec
doesn't do well with toy-sized datasets. So while this tiny setup may be helpful to check for basic syntax/format issues, don't expect realistic results until you're training with many hundreds-of-thousands of words.train()
if you already supplied your corpus at instantiation, as you have. The model will do all steps automatically. (If, on the other hand, you don't supply your corpus, you'd then have to call both build_vocab()
and train()
.) If you enable logging at the INFO level all the steps happening behind the scenes will be clearer.Upvotes: 2