Reputation: 2488
I want to make a visualization that shows how many words a substring belongs to. Some substrings may belong to the same set of words. For example, the substrings tion
and ten
are both substrings of the words detention
and attention
.
I thought about a tree representation but in my actual program there are hundreds of these parent to child relationships, and since two or three parents may have the same child it can get really complicated. Therefore, I think a network would work.
Here's the code that sets it up.
from collections import defaultdict
words = ['mention', 'detention', 'attention', 'iteraction', 'interception', 'solution', 'iteraction',
'reiteration', 'determination', 'tension', 'tentative', 'intention', 'solution',
'tentative', 'concatenation', 'alternative', 'bitter', 'asterisk']
substring_dict = defaultdict(list)
ter = 'ter'
tion = 'tion'
ten = 'ten'
for entry in words:
if ter in entry:
substring_dict[ter].append(entry)
if tion in entry:
substring_dict[tion].append(entry)
if ten in entry:
substring_dict[ten].append(entry)
substring_dict
is a dictionary of lists where the key is the substring and the value is the list of words that the substring belongs to.
How do I represent this visually? I was thinking I could color code the nodes as well.
Upvotes: 0
Views: 58
Reputation: 19805
You can use networkx to visualize your graph.
Let's first make a small change in your pre-processing:
words = ['mention', 'detention', 'attention', 'iteraction', 'interception', 'solution', 'iteraction','reiteration', 'determination', 'tension', 'tentative', 'intention', 'solution', 'tentative', 'concatenation', 'alternative', 'bitter', 'asterisk']
subs = ['ter','tion','ten']
edges = []
for word in words:
for sub in subs:
if sub in word:
edges.append( (word, sub) )
print edges[0:6]
# prints [('mention', 'tion'), ('detention', 'tion'), ('detention', 'ten'), ('attention', 'tion'), ('attention', 'ten'), ('iteraction', 'ter')]
Let's start plotting:
import networkx as nx
import matplotlib.pyplot as plt
g = nx.Graph()
g.add_nodes_from(subs)
g.add_nodes_from(words)
g.add_edges_from(edges)
pos=nx.spring_layout(g)
nx.draw_networkx_nodes(g, pos,
nodelist=subs,
node_color='r',
node_size=1000,
alpha=0.8)
nx.draw_networkx_nodes(g, pos,
nodelist=words,
node_color='b',
node_size=1000,
alpha=0.8)
nx.draw_networkx_edges(g, pos, width=1.0, alpha=0.5)
nx.draw_networkx_labels(g, pos, dict(zip(subs,subs)) )
nx.draw_networkx_labels(g, pos, dict(zip(words,words)) )
It produces:
Notes:
nx.spring_layout
which should be changed.Upvotes: 1