Reputation: 115
I'm trying to create a sunburst plot where different rows have different lengths, and get the error message 'dtype: object, 'is not a leaf.' I have read this 'Note that the parents of None entries must be a leaf, i.e. it cannot have other children than None (otherwise a ValueError is raised).' on the plotly pages https://plotly.com/python/sunburst-charts/#rectangular-data-with-missing-values but don't fully understand that.
I have a bigger dataset, but the same thing happens with this one:
testdf = pd.DataFrame(
[['Max',10,'M','a', 'x',None],
['Ma',5, 'M', 'a', None,None],
['Johan',6, 'J','o','h','a']],
index=[1, 2, 3],
columns=['First_Name','Count','a', 'b', 'c','d'])
testdf
fig=px.sunburst(testdf,path=['a','b','c','d'],values='Count')
fig.show()
The ValueError is this:
ValueError: ('Non-leaves rows are not permitted in the dataframe \n', a M b a c
d
Name: 1, dtype: object, 'is not a leaf.')
So I think it's caused by the fact that the letter a is not a leaf since the x of the first row is also attached to it, but I would like to have the sunburst stop at the letter a for the second row, and at the letter x for the first row. Any help is greatly appreciated!
Upvotes: 4
Views: 7085
Reputation: 610
Another viable solution is converting the path to a parent child hierarchy, as described in this issue: https://github.com/plotly/plotly.py/issues/4308
This yields the correct result, as long as path elements are distinct.
The plotly
specific layout_value
column is necessary, because of a bug with branchvalues="total"
not rendering
# %%
import typing as t
import pandas as pd
import plotly.express as px
# %%
def get_data()-> pd.DataFrame:
return pd.DataFrame([
{ 'path': 'music/pop/jackson/billie_jean.mp3', 'score': 0.8, 'views': 1000 },
{ 'path': 'music/pop/jackson/beat_it.mp3', 'score': 0.9, 'views': 2000 },
{ 'path': 'music/pop/abba/dancing_queen.mp3', 'score': 0.7, 'views': 1500 },
{ 'path': 'music/pop/abba/voulez-vous/voulez-vous.mp3', 'score': 0.75, 'views': 1500 },
{ 'path': 'music/pop/abba/voulez-vous/summer_night_city.mp3', 'score': 0.8, 'views': 1500 },
{ 'path': 'music/pop/abba/waterloo.mp3', 'score': 0.8, 'views': 1500 },
{ 'path': 'music/pop/abba/chiquitita.mp3', 'score': 0.7, 'views': 1500 },
{ 'path': 'music/pop/abba/s.o.s.mp3', 'score': 0.7, 'views': 1500 },
{ 'path': 'music/rock/queen/bohemian_rhapsody.mp3', 'score': 0.9, 'views': 3000 },
])
# %%
col_path: str = 'path'
col_parent: str = 'parent'
def path_parent_fn(path):
path = path.split('/')
path = '/'.join(path[:-1]) if len(path) > 0 else ''
path = path.strip()
return path if len(path) > 0 else None
aggregation = { 'score': 'median', 'views': 'sum', 'layout_value': lambda x: 0 }
# %%
PathT = t.TypeVar('PathT')
Axis = t.Union[int, str]
def create_hierarchy_data(
data: pd.DataFrame,
col_path: Axis,
col_parent: Axis,
path_parent_fn: t.Callable[[PathT], t.Union[PathT, None]],
aggregation: t.Any
) -> pd.DataFrame:
data[col_parent] = data[col_path].apply(path_parent_fn)
def parent_in_data_or_na():
return data[col_parent].isin(data[col_path]) | data[col_parent].isna()
while not parent_in_data_or_na().all(skipna=True):
missing_parents = data[data[col_parent].isin(data[col_path]) == False]
missing_parents = missing_parents.groupby(col_parent, as_index=False)
missing_parents_keys = missing_parents.groups.keys()
missing_parents = missing_parents.agg(aggregation)
missing_parents[col_path] = missing_parents_keys
missing_parents[col_parent] = missing_parents[col_path].apply(path_parent_fn)
data = pd.concat([
data,
missing_parents
], ignore_index=True)
data = data[data[col_path].isna() == False]
return data
# %%
data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
data
# %%
data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, hover_data=['views', 'score'])
fig.update_traces(hovertemplate='''
<b>%{label}</b><br>Votes: %{customdata[0]}<br>Score: %{customdata[1]}
''')
fig
# %%
data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.sunburst(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, branchvalues='remainder')
fig
Upvotes: 0
Reputation: 93
This is explained here: https://community.plotly.com/t/sunburst-chart-cant-handle-none-values/35383.
Basically, if you have a None
value, its parents have to be unique. In your example, the first row is valid. The second isn't because ["M", "a", None, None]
shares the same parent as ["M", "a", "x", None]
.
In fact, if you run the below, it works.
testdf = pd.DataFrame([
['Max', 10, 'M', 'a', 'x', None],
['Ma', 5, 'M', 'a', 'd', None],
['Johan', 6, 'J', 'o', 'h', 'a']],
index=[1, 2, 3],
columns=['First_Name', 'Count', 'a', 'b', 'c', 'd'])
testdf
fig = px.sunburst(testdf, path=['a', 'b', 'c', 'd'], values='Count')
Upvotes: 6