Reputation: 182
I'm not sure if this is possible or not, but I'm having a hard time figuring out where to start reading to find out.
I have a large amount of data like below:
0 1 2 3 4
xyres zres fms flts pts
11020 1 1 0 2 0
11105 1 1 1 0 5
10005 1 0 0 0 5
01106 0 1 1 0 6
01001 0 1 0 0 1
10121 1 0 1 2 1
00016 0 0 0 1 6
01127 0 1 1 2 7
01010 0 1 0 1 0
10001 1 0 0 0 1
I'd like to convert it to a tree structure, like so, where each node has the same parent node if the variable to the left of it has the same value.
xyres zres fms flts pts
______0 ____6
| |____|
______0 1
|
| ____0
| | |____1
0 ______0
| | | ____1
| | | |
| | |____1
|______|
1 ____0
|______| |____6
1
|____
2
|____7
____0
| |____
______0 1
|
______0
| |______
1 1...etc.
|______
1 .....etc.
Is it possible to do this automatically, so that I can obtain data in a tree structure that I can then use with packages like networkx or pygraphviz? Alternatively, any tips for basic introductory reading on creating tree data structures, for someone without any formal programming background? What I've found so far all assumes that you already have data in the correct format and is about manipulating it, not about creating it from scratch.
Upvotes: 0
Views: 168
Reputation: 829
You can try:
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
G = nx.Graph()
df = pd.read_csv('data.csv')
keys = list(df.groupby(list(df.columns)).count().index)
def key2id(key):
return '-'.join(map(str, key))
for key in keys:
prev = None
for i in range(1, len(key) + 1):
k = key2id(key[:i])
G.add_node(k)
if prev is not None:
G.add_edge(prev, k)
prev = k
nx.draw(G, with_labels=True)
plt.show()
Short explanation:
First we groupby
by all the relevant columns to eliminate duplicates. Each remaining row represents a leaf node; we iterate over all the leaf nodes and add all the intermediate nodes (along with the relevant edge).
Upvotes: 1