How to represent this as a Network?

Question

I want to make a visualization that shows how many words a substring belongs to. Some substrings may belong to the same set of words. For example, the substrings tion and ten are both substrings of the words detention and attention.

I thought about a tree representation but in my actual program there are hundreds of these parent to child relationships, and since two or three parents may have the same child it can get really complicated. Therefore, I think a network would work.

Here's the code that sets it up.

from collections import defaultdict

words = ['mention', 'detention', 'attention', 'iteraction', 'interception', 'solution', 'iteraction',
     'reiteration', 'determination', 'tension', 'tentative', 'intention', 'solution',
     'tentative', 'concatenation', 'alternative', 'bitter', 'asterisk']

substring_dict = defaultdict(list)
ter = 'ter'
tion = 'tion'
ten = 'ten'

for entry in words:
    if ter in entry:
        substring_dict[ter].append(entry)
    if tion in entry:
        substring_dict[tion].append(entry)
    if ten in entry:
        substring_dict[ten].append(entry)

substring_dict is a dictionary of lists where the key is the substring and the value is the list of words that the substring belongs to.

How do I represent this visually? I was thinking I could color code the nodes as well.

Sait · Accepted Answer

You can use networkx to visualize your graph.

Let's first make a small change in your pre-processing:

words = ['mention', 'detention', 'attention', 'iteraction', 'interception', 'solution', 'iteraction','reiteration', 'determination', 'tension', 'tentative', 'intention', 'solution', 'tentative', 'concatenation', 'alternative', 'bitter', 'asterisk']

subs = ['ter','tion','ten']
edges = []

for word in words:
    for sub in subs:
        if sub in word:
            edges.append( (word, sub) )

print edges[0:6]

# prints [('mention', 'tion'), ('detention', 'tion'), ('detention', 'ten'), ('attention', 'tion'), ('attention', 'ten'), ('iteraction', 'ter')]

Let's start plotting:

import networkx as nx
import matplotlib.pyplot as plt

g = nx.Graph()
g.add_nodes_from(subs)
g.add_nodes_from(words)
g.add_edges_from(edges)
pos=nx.spring_layout(g)

nx.draw_networkx_nodes(g, pos,
                       nodelist=subs,
                       node_color='r',
                       node_size=1000,
                       alpha=0.8)

nx.draw_networkx_nodes(g, pos,
                       nodelist=words,
                       node_color='b',
                       node_size=1000,
                       alpha=0.8)

nx.draw_networkx_edges(g, pos, width=1.0, alpha=0.5)

nx.draw_networkx_labels(g, pos, dict(zip(subs,subs)) )
nx.draw_networkx_labels(g, pos, dict(zip(words,words)) )

It produces: enter image description here

Notes:

You might want to work on placement of the nodes, now we are using nx.spring_layout which should be changed.
Play with the sizes of the nodes so that labels do not extend outside.

How to represent this as a Network?

Answers (1)

Related Questions