Nate Cook3
Nate Cook3

Reputation: 596

How to implement Word2Vec in Java?

I installed word2Vec using this tutorial on by Ubuntu laptop. Is it completely necessary to install DL4J in order to implement word2Vec vectors in Java? I'm comfortable working in Eclipse and I'm not sure that I want all the other pre-requisites that DL4J wants me to install.

Ideally there would be a really easy way for me to just use the Java code I've already written (in Eclipse) and change a few lines -- so that word look-ups that I am doing would retrieve a word2Vec vector instead of the current retrieval process I'm using.


Also, I've looked into using GloVe, however, I do not have MatLab. Is it possible to use GloVe without MatLab? (I got an error while installing it because of this). If so, the same question as above goes... I have no idea how to implement it in Java.

Upvotes: 6

Views: 12824

Answers (2)

Adam Gibson
Adam Gibson

Reputation: 3205

Dl4j author here. Our word2vec implementation is targeted for people who need to have custom pipelines. I don't blame you for going the simple route here.

Our word2vec implementation is meant for when you want to do something with them not for messing around. The c word2vec format is pretty straight forward.

Here is parsing logic in java if you'd like: https://github.com/deeplearning4j/deeplearning4j/blob/374609b2672e97737b9eb3ba12ee62fab6cfee55/deeplearning4j-scaleout/deeplearning4j-nlp/src/main/java/org/deeplearning4j/models/embeddings/loader/WordVectorSerializer.java#L113

Hope that helps a bit

Upvotes: 9

Debasis
Debasis

Reputation: 3750

What is preventing you from saving the word2vec (the C program) output in text format and then read the file with a Java piece of code and load the vectors in a hashmap keyed by the word string?

Some code snippets:

// Class to store a hashmap of wordvecs
public class WordVecs {

    HashMap<String, WordVec> wordvecmap;
    ....
    void loadFromTextFile() {
        String wordvecFile = prop.getProperty("wordvecs.vecfile");
        wordvecmap = new HashMap();
        try (FileReader fr = new FileReader(wordvecFile);
            BufferedReader br = new BufferedReader(fr)) {
            String line;

            while ((line = br.readLine()) != null) {
                WordVec wv = new WordVec(line);
                wordvecmap.put(wv.word, wv);
            }
        }
        catch (Exception ex) { ex.printStackTrace(); }        
    }
    ....
}

// class for each wordvec
public class WordVec implements Comparable<WordVec> {
    public WordVec(String line) {
        String[] tokens = line.split("\\s+");
        word = tokens[0];
        vec = new float[tokens.length-1];
        for (int i = 1; i < tokens.length; i++)
            vec[i-1] = Float.parseFloat(tokens[i]);
        norm = getNorm();
    }
    ....
}

If you want to get the nearest neighbours for a given word, you can keep a list of N nearest pre-computed neighbours associated with each WordVec object.

Upvotes: 8

Related Questions