Shikhar
Shikhar

Reputation: 99

How to generate JSON from a pandas data frame for dynamic d3.js tree

I'm new to Pandas and have a complex requirement.

I have a data frame, which contains multiple columns like the one below

Parent                            Child      Color
IT;Programming;Trending;Demand    Python     #6200ea
IT;Programming;Trending;Demand    Docker     #7c4dff
IT;Programming;Testing            Selenium   #b388ff
IT;Programming;Old                C/C++/C#   #ff1744
IT-Tools;Testing                  HP UFT     #aa00ff
IT-Tools;IDE                      PyCharm    #9c27b0

I've used str.split(';') to generate multiple Parent columns in the data frame

df = df.join(df.Parent.str.split(";", expand=True).add_prefix('branch'))
df.drop(columns=['Parent'], inplace=True)
print(df)

Output:

branch0    branch1        branch2    branch3   Child      Color
IT         Programming    Trending   Demand    Python     #6200ea
IT         Programming    Trending   Demand    Docker     #7c4dff
IT         Programming    Testing    None      Selenium   #b388ff
IT         Programming    Old        None      C/C++/C#   #ff1744
IT-Tools   Testing        None       None      HP UFT     #aa00ff
IT-Tools   IDE            None       None      PyCharm    #9c27b0

I need to generate a classification tree for which I need to generate a JSON (along with color value) which is mentioned on the below website

https://bl.ocks.org/swayvil/b86f8d4941bdfcbfff8f69619cd2f460#data-example.json

Can somebody please help me Thank you!

Upvotes: 0

Views: 527

Answers (1)

Robin Mackenzie
Robin Mackenzie

Reputation: 19289

Your data needs updates for a d3 tree:

  • A d3 tree needs a single root node - IT and IT-Tools need a parent node
  • The Testing node is ambiguous as it is both a child of Programming and IT-Tools - so you will need update e.g. Testing1 and Testing2

Your 2nd data frame shows your hierarchy is unbalanced (leaves at different depths) because of the Nones in branch2 and branch3. Therefore, your output JSON should be individual branch definitions, rather than multiple definitions per row, like this:

parent,child,color
IT,Programming,None
Programming,Trending,None
Trending,Demand,None
Demand,Python,#6200ea
etc

This is more efficient than multiple-branches-per-row which has redundancy. E.g. IT being the parent of Programming is defined multiple times whereas with the parent/ child structure it is defined a single time.

The following code translates your input to an output you can send as a response to a client and then use d3 to build a tree.

We can use a set to store unique strings of pairs of parents and then add an item for the last parent/ child. Then create another dataframe from this set (creating columns based on split by ; in a similar vein to your OP) and then export as JSON:

import io
import pandas as pd

# data as a string
text = '''Parent Child Color
IT;Programming;Trending;Demand    Python     #6200ea
IT;Programming;Trending;Demand    Docker     #7c4dff
IT;Programming;Testing1           Selenium   #b388ff
IT;Programming;Old                C/C++/C#   #ff1744
IT-Tools;Testing2                 HP-UFT     #aa00ff
IT-Tools;IDE                      PyCharm    #9c27b0'''

# your original data frame
df = pd.read_csv(io.StringIO(text), sep=r'\s+')

# prepend a Root to Parent column
df.Parent = 'Root;' + df.Parent

# dataframe to set for unique branches
# start with just root in the set
branch_strings = set([';Root;'])
for index, row in df.iterrows():
  parents = row.Parent.split(';')
  for curr, next in zip(parents, parents[1:]):
    branch_strings.add(';'.join([curr, next, '']))
  branch_strings.add(';'.join([next, row.Child, row.Color]))

# set to list
branches = list(map(lambda row: row.split(';'), branch_strings))

# new dataframe from relations
df2 = pd.DataFrame(branches, columns=['parent', 'child', 'color'])

# JSON
json = df2.to_json(orient='records')
print(json)

Which produces this JSON output:

const data = [
  {"parent":"Programming","child":"Old","color":""},
  {"parent":"Testing1","child":"Selenium","color":"#b388ff"},
  {"parent":"","child":"Root","color":""},{"parent":"IT-Tools","child":"IDE","color":""},
  {"parent":"IDE","child":"PyCharm","color":"#9c27b0"},
  {"parent":"Programming","child":"Trending","color":""},
  {"parent":"IT","child":"Programming","color":""},
  {"parent":"Trending","child":"Demand","color":""},
  {"parent":"Root","child":"IT","color":""},
  {"parent":"Old","child":"C\/C++\/C#","color":"#ff1744"},
  {"parent":"Programming","child":"Testing1","color":""},
  {"parent":"Root","child":"IT-Tools","color":""},{"parent":"IT-Tools","child":"Testing2","color":""},
  {"parent":"Demand","child":"Docker","color":"#7c4dff"},
  {"parent":"Demand","child":"Python","color":"#6200ea"},
  {"parent":"Testing2","child":"HP-UFT","color":"#aa00ff"}
];

For a D3 example with that JSON - please review the accepted answer here. The adaptaion below is just a proof of concept for your unbalanced hierarchy input. To convert to the block in your OP is beyond the scope of a Stack Overflow answer but this should put you in the right direction:

const data = [
  {"parent":"Programming","child":"Old","color":""},
  {"parent":"Testing1","child":"Selenium","color":"#b388ff"},
  {"parent":"","child":"Root","color":""},{"parent":"IT-Tools","child":"IDE","color":""},
  {"parent":"IDE","child":"PyCharm","color":"#9c27b0"},
  {"parent":"Programming","child":"Trending","color":""},
  {"parent":"IT","child":"Programming","color":""},
  {"parent":"Trending","child":"Demand","color":""},
  {"parent":"Root","child":"IT","color":""},
  {"parent":"Old","child":"C\/C++\/C#","color":"#ff1744"},
  {"parent":"Programming","child":"Testing1","color":""},
  {"parent":"Root","child":"IT-Tools","color":""},{"parent":"IT-Tools","child":"Testing2","color":""},
  {"parent":"Demand","child":"Docker","color":"#7c4dff"},
  {"parent":"Demand","child":"Python","color":"#6200ea"},
  {"parent":"Testing2","child":"HP-UFT","color":"#aa00ff"}
];

const root = d3.stratify()
  .id(d => d["child"])
  .parentId(d => d["parent"])
  (data);

const margin = {left: 40, top: 40, right: 40, bottom: 40}
const width = 500;
const height = 200;
const svg = d3.select("body")
  .append("svg")
  .attr("width", width)
  .attr("height", height);
      
const g = svg.append("g")
  .attr('transform', `translate(${margin.left},${ margin.right})`);

const tree = d3.tree()
  .size([height-margin.top-margin.bottom,width-margin.left-margin.right]);

const link = g.selectAll(".link")
  .data(tree(root).links())
  .enter()
  .append("path")
  .attr("class", "link")
  .attr("d", d3.linkHorizontal()
    .x(function(d) { return d.y; })
    .y(function(d) { return d.x; })
  );

const node = g.selectAll(".node")
  .data(root.descendants())
  .enter()
  .append("g")
  .attr("transform", function(d) { 
    return `translate(${d.y},${d.x})`; 
  });

node.append("circle")
  .attr("r", 5)
  .style("fill", function(d) {
    return d.data.color ? d.data.color : "#000";
  });
      
node.append("text")
  .attr("class", "label")
  .text(function(d) { return d.data.child; })
  .attr('y', -4)
  .attr('x', 0)
  .attr('text-anchor','middle');
path {
  fill: none;
  stroke: steelblue;
  stroke-width: 1px;
}

.label {
  font-size: smaller;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.0/d3.min.js"></script>

Upvotes: 1

Related Questions