Reputation: 99
I'm new to Pandas and have a complex requirement.
I have a data frame, which contains multiple columns like the one below
Parent Child Color
IT;Programming;Trending;Demand Python #6200ea
IT;Programming;Trending;Demand Docker #7c4dff
IT;Programming;Testing Selenium #b388ff
IT;Programming;Old C/C++/C# #ff1744
IT-Tools;Testing HP UFT #aa00ff
IT-Tools;IDE PyCharm #9c27b0
I've used str.split(';') to generate multiple Parent columns in the data frame
df = df.join(df.Parent.str.split(";", expand=True).add_prefix('branch'))
df.drop(columns=['Parent'], inplace=True)
print(df)
Output:
branch0 branch1 branch2 branch3 Child Color
IT Programming Trending Demand Python #6200ea
IT Programming Trending Demand Docker #7c4dff
IT Programming Testing None Selenium #b388ff
IT Programming Old None C/C++/C# #ff1744
IT-Tools Testing None None HP UFT #aa00ff
IT-Tools IDE None None PyCharm #9c27b0
I need to generate a classification tree for which I need to generate a JSON (along with color value) which is mentioned on the below website
https://bl.ocks.org/swayvil/b86f8d4941bdfcbfff8f69619cd2f460#data-example.json
Can somebody please help me Thank you!
Upvotes: 0
Views: 527
Reputation: 19289
Your data needs updates for a d3 tree:
IT
and IT-Tools
need a parent nodeTesting
node is ambiguous as it is both a child of Programming
and IT-Tools
- so you will need update e.g. Testing1
and Testing2
Your 2nd data frame shows your hierarchy is unbalanced (leaves at different depths) because of the None
s in branch2
and branch3
. Therefore, your output JSON should be individual branch definitions, rather than multiple definitions per row, like this:
parent,child,color
IT,Programming,None
Programming,Trending,None
Trending,Demand,None
Demand,Python,#6200ea
etc
This is more efficient than multiple-branches-per-row which has redundancy. E.g. IT
being the parent of Programming
is defined multiple times whereas with the parent/ child structure it is defined a single time.
The following code translates your input to an output you can send as a response to a client and then use d3 to build a tree.
We can use a set to store unique strings of pairs of parents and then add an item for the last parent/ child. Then create another dataframe from this set (creating columns based on split by ;
in a similar vein to your OP) and then export as JSON:
import io
import pandas as pd
# data as a string
text = '''Parent Child Color
IT;Programming;Trending;Demand Python #6200ea
IT;Programming;Trending;Demand Docker #7c4dff
IT;Programming;Testing1 Selenium #b388ff
IT;Programming;Old C/C++/C# #ff1744
IT-Tools;Testing2 HP-UFT #aa00ff
IT-Tools;IDE PyCharm #9c27b0'''
# your original data frame
df = pd.read_csv(io.StringIO(text), sep=r'\s+')
# prepend a Root to Parent column
df.Parent = 'Root;' + df.Parent
# dataframe to set for unique branches
# start with just root in the set
branch_strings = set([';Root;'])
for index, row in df.iterrows():
parents = row.Parent.split(';')
for curr, next in zip(parents, parents[1:]):
branch_strings.add(';'.join([curr, next, '']))
branch_strings.add(';'.join([next, row.Child, row.Color]))
# set to list
branches = list(map(lambda row: row.split(';'), branch_strings))
# new dataframe from relations
df2 = pd.DataFrame(branches, columns=['parent', 'child', 'color'])
# JSON
json = df2.to_json(orient='records')
print(json)
Which produces this JSON output:
const data = [
{"parent":"Programming","child":"Old","color":""},
{"parent":"Testing1","child":"Selenium","color":"#b388ff"},
{"parent":"","child":"Root","color":""},{"parent":"IT-Tools","child":"IDE","color":""},
{"parent":"IDE","child":"PyCharm","color":"#9c27b0"},
{"parent":"Programming","child":"Trending","color":""},
{"parent":"IT","child":"Programming","color":""},
{"parent":"Trending","child":"Demand","color":""},
{"parent":"Root","child":"IT","color":""},
{"parent":"Old","child":"C\/C++\/C#","color":"#ff1744"},
{"parent":"Programming","child":"Testing1","color":""},
{"parent":"Root","child":"IT-Tools","color":""},{"parent":"IT-Tools","child":"Testing2","color":""},
{"parent":"Demand","child":"Docker","color":"#7c4dff"},
{"parent":"Demand","child":"Python","color":"#6200ea"},
{"parent":"Testing2","child":"HP-UFT","color":"#aa00ff"}
];
For a D3 example with that JSON - please review the accepted answer here. The adaptaion below is just a proof of concept for your unbalanced hierarchy input. To convert to the block in your OP is beyond the scope of a Stack Overflow answer but this should put you in the right direction:
const data = [
{"parent":"Programming","child":"Old","color":""},
{"parent":"Testing1","child":"Selenium","color":"#b388ff"},
{"parent":"","child":"Root","color":""},{"parent":"IT-Tools","child":"IDE","color":""},
{"parent":"IDE","child":"PyCharm","color":"#9c27b0"},
{"parent":"Programming","child":"Trending","color":""},
{"parent":"IT","child":"Programming","color":""},
{"parent":"Trending","child":"Demand","color":""},
{"parent":"Root","child":"IT","color":""},
{"parent":"Old","child":"C\/C++\/C#","color":"#ff1744"},
{"parent":"Programming","child":"Testing1","color":""},
{"parent":"Root","child":"IT-Tools","color":""},{"parent":"IT-Tools","child":"Testing2","color":""},
{"parent":"Demand","child":"Docker","color":"#7c4dff"},
{"parent":"Demand","child":"Python","color":"#6200ea"},
{"parent":"Testing2","child":"HP-UFT","color":"#aa00ff"}
];
const root = d3.stratify()
.id(d => d["child"])
.parentId(d => d["parent"])
(data);
const margin = {left: 40, top: 40, right: 40, bottom: 40}
const width = 500;
const height = 200;
const svg = d3.select("body")
.append("svg")
.attr("width", width)
.attr("height", height);
const g = svg.append("g")
.attr('transform', `translate(${margin.left},${ margin.right})`);
const tree = d3.tree()
.size([height-margin.top-margin.bottom,width-margin.left-margin.right]);
const link = g.selectAll(".link")
.data(tree(root).links())
.enter()
.append("path")
.attr("class", "link")
.attr("d", d3.linkHorizontal()
.x(function(d) { return d.y; })
.y(function(d) { return d.x; })
);
const node = g.selectAll(".node")
.data(root.descendants())
.enter()
.append("g")
.attr("transform", function(d) {
return `translate(${d.y},${d.x})`;
});
node.append("circle")
.attr("r", 5)
.style("fill", function(d) {
return d.data.color ? d.data.color : "#000";
});
node.append("text")
.attr("class", "label")
.text(function(d) { return d.data.child; })
.attr('y', -4)
.attr('x', 0)
.attr('text-anchor','middle');
path {
fill: none;
stroke: steelblue;
stroke-width: 1px;
}
.label {
font-size: smaller;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.0/d3.min.js"></script>
Upvotes: 1