Sanne Hombroek
Sanne Hombroek

Reputation: 115

I'm trying to create a plotly sunburst plot but get error message: 'dtype: object, 'is not a leaf.'

I'm trying to create a sunburst plot where different rows have different lengths, and get the error message 'dtype: object, 'is not a leaf.' I have read this 'Note that the parents of None entries must be a leaf, i.e. it cannot have other children than None (otherwise a ValueError is raised).' on the plotly pages https://plotly.com/python/sunburst-charts/#rectangular-data-with-missing-values but don't fully understand that.

I have a bigger dataset, but the same thing happens with this one:

testdf = pd.DataFrame(
[['Max',10,'M','a', 'x',None],
['Ma',5, 'M', 'a', None,None],
['Johan',6, 'J','o','h','a']],
index=[1, 2, 3],
columns=['First_Name','Count','a', 'b', 'c','d'])
testdf

fig=px.sunburst(testdf,path=['a','b','c','d'],values='Count')
fig.show()

The ValueError is this:

ValueError: ('Non-leaves rows are not permitted in the dataframe \n', a M b a c
d
Name: 1, dtype: object, 'is not a leaf.')

So I think it's caused by the fact that the letter a is not a leaf since the x of the first row is also attached to it, but I would like to have the sunburst stop at the letter a for the second row, and at the letter x for the first row. Any help is greatly appreciated!

Upvotes: 4

Views: 7085

Answers (2)

Prophet Lamb
Prophet Lamb

Reputation: 610

Another viable solution is converting the path to a parent child hierarchy, as described in this issue: https://github.com/plotly/plotly.py/issues/4308

This yields the correct result, as long as path elements are distinct. The plotly specific layout_value column is necessary, because of a bug with branchvalues="total" not rendering

# %%
import typing as t
import pandas as pd
import plotly.express as px

# %%
def get_data()-> pd.DataFrame:
   return pd.DataFrame([
        { 'path': 'music/pop/jackson/billie_jean.mp3', 'score': 0.8, 'views': 1000 },
        { 'path': 'music/pop/jackson/beat_it.mp3', 'score': 0.9, 'views': 2000 },
        { 'path': 'music/pop/abba/dancing_queen.mp3', 'score': 0.7, 'views': 1500 },
        { 'path': 'music/pop/abba/voulez-vous/voulez-vous.mp3', 'score': 0.75, 'views': 1500 },
        { 'path': 'music/pop/abba/voulez-vous/summer_night_city.mp3', 'score': 0.8, 'views': 1500 },
        { 'path': 'music/pop/abba/waterloo.mp3', 'score': 0.8, 'views': 1500 },
        { 'path': 'music/pop/abba/chiquitita.mp3', 'score': 0.7, 'views': 1500 },
        { 'path': 'music/pop/abba/s.o.s.mp3', 'score': 0.7, 'views': 1500 },
        { 'path': 'music/rock/queen/bohemian_rhapsody.mp3', 'score': 0.9, 'views': 3000 },
    ])


# %%

col_path: str = 'path'
col_parent: str = 'parent'
def path_parent_fn(path):
  path = path.split('/')
  path = '/'.join(path[:-1]) if len(path) > 0 else ''
  path = path.strip()
  return path if len(path) > 0 else None

aggregation = { 'score': 'median', 'views': 'sum', 'layout_value': lambda x: 0 }


# %%

PathT = t.TypeVar('PathT')
Axis = t.Union[int, str]

def create_hierarchy_data(
  data: pd.DataFrame,
  col_path: Axis,
  col_parent: Axis,
  path_parent_fn: t.Callable[[PathT], t.Union[PathT, None]],
  aggregation: t.Any
) -> pd.DataFrame:
  data[col_parent] = data[col_path].apply(path_parent_fn)

  def parent_in_data_or_na():
    return data[col_parent].isin(data[col_path]) | data[col_parent].isna()

  while not parent_in_data_or_na().all(skipna=True):
    missing_parents = data[data[col_parent].isin(data[col_path]) == False]
    missing_parents = missing_parents.groupby(col_parent, as_index=False)
    missing_parents_keys = missing_parents.groups.keys()
    missing_parents = missing_parents.agg(aggregation)
    missing_parents[col_path] = missing_parents_keys
    missing_parents[col_parent] = missing_parents[col_path].apply(path_parent_fn)
    data = pd.concat([
      data,
      missing_parents
    ], ignore_index=True)
  data = data[data[col_path].isna() == False]
  return data


# %%

data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
data

# %%


data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, hover_data=['views', 'score'])
fig.update_traces(hovertemplate='''
<b>%{label}</b><br>Votes: %{customdata[0]}<br>Score: %{customdata[1]}
''')
fig


# %%

data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.sunburst(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, branchvalues='remainder')
fig

Upvotes: 0

Alberto
Alberto

Reputation: 93

This is explained here: https://community.plotly.com/t/sunburst-chart-cant-handle-none-values/35383.

Basically, if you have a None value, its parents have to be unique. In your example, the first row is valid. The second isn't because ["M", "a", None, None] shares the same parent as ["M", "a", "x", None].

In fact, if you run the below, it works.

testdf = pd.DataFrame([
    ['Max', 10, 'M', 'a', 'x', None],
    ['Ma', 5, 'M', 'a', 'd', None],
    ['Johan', 6, 'J', 'o', 'h', 'a']],
    index=[1, 2, 3],
    columns=['First_Name', 'Count', 'a', 'b', 'c', 'd'])

testdf

fig = px.sunburst(testdf, path=['a', 'b', 'c', 'd'], values='Count')

Upvotes: 6

Related Questions