Reputation: 1319
I want to train word2vec and fasttext to get vectors for a specific dataset that I have.
What should my model take as input?
My file is like this:
Customer_4: I want to book a ticket to New York.
Agent_9: Okay, when do you want the tickets for
Customer_4: hmm, wait a sec
Agent_9: Sure
Customer_4: When is the least expensive to fly
Now, How should I prepare my data for word2vec to run? Does the word2vec model take inter sentence similaarity into account, i.e. should i not prepare the corpus sentence wise.
Upvotes: 2
Views: 649
Reputation: 1607
One way would be that you first split your document into lines, then for each line, split the line into tokens. Then you end up with a corpus of list of list of tokens. After that, you can feed it into the gensim word2vec model.
Upvotes: 1