Reputation: 38155
I have 100 nodes and 4950 edges. What is the fastest way to create a graph in Python (not planning at all to visualize or draw it) so that I can have access to node information so that I would need what each item in the 2d matrix mean by saying node 1 is connected to node 3? (also I don't need to save it as matrix).
import gensim
import nltk
from gensim.models import word2vec
from nltk.corpus import stopwords
import logging
import re
import itertools
import glob
from collections import defaultdict
import networkx as nx
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
level=logging.INFO)
sentences = word2vec.Text8Corpus("/home/mona/mscoco/text8")
model = word2vec.Word2Vec(sentences, workers = 16)
#model.init_sims(replace = True)
model_name = "text8_data"
model.save(model_name)
stopwords = nltk.corpus.stopwords.words('english')
path = "/home/mona/mscoco/caption_files/*.txt"
files = glob.glob(path)
adj_list = defaultdict(lambda: defaultdict(lambda: 0))
for file in files:
g.add_nodes(file)
for file1, file2 in itertools.combinations(files, 2):
with open(file1) as f1:
f1_text = f1.read()
f1_words = re.sub("[^a-zA-Z]", ' ', f1_text).lower().split()
f1_words = [w for w in f1_words if w not in stopwords]
print(f1_text)
f1.close()
with open(file2) as f2:
f2_text = f2.read()
f2_words = re.sub("[^a-zA-Z]", ' ', f2_text).lower().split()
f2_words = [w for w in f2_words if w not in stopwords]
print(f2_text)
f2.close()
print('{0}: {1}: {2}'.format(file1, file2, model.wmdistance(f1_words, f2_words)))
g.add_edge(file1, file2, model.wmdistance(f1_words, f2_words))
print(g.number_of_edges())
print(g.number_of_edges())
nx.write_gml(g, "gensim.gml")
Please let me know if you have better suggestion that my current code. I will eventually have something like 20 nodes and 190 edges. I am mostly looking for something that processing its output would be easy to another program like MATLAB. I am not sure if .gml files are easy to process in MATLAB.
Upvotes: 0
Views: 556
Reputation: 1574
I think generating a GML file for the precise purpose of reusing in Matlab is probably fine. This question has some more information about that.
Convert GML file to adjacency matrix in matlab
Upvotes: 1