equanimity
equanimity

Reputation: 2533

How to reshape a Pandas data frame before passing to a Plotly function?

I'm trying to create a table of data using the Table() function in Plotly.

My data is as follows:

import pandas as pd

test_df = pd.DataFrame({'Manufacturer':['Mercedes', 'Buick', 'Ford', 'Buick', 'Buick', 'Ford', 'Buick', 'Chrysler', 'Ford', 'Buick', 'Chrysler', 'Ford', 'Buick', 'Ford', 'Ford', 'Chrysler', 'Chrysler', 'Ford', 'Chrysler', 'Chrysler', 'Chrysler', 'Buick'],
                          'Metric':['MPG', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score', 'Score'],
                          'Statistic':['External', 'Min', 'Max', 'Average', 'Median', '90th', '95th', '99th', 'Min', 'Max', 'Average', 'Median', '90th', '95th', '99th','Min', 'Max', 'Average', 'Median', '90th', '95th', '99th'],
                          'Value':[22, 3.405, 100.29, 4.62, 4.425, 5.34, 5.83, 7.75, 2.6323, 210, 4.193, 3.28, 5.04, 6.36, 11.01, 3.72, 43, 4.98, 4.82, 5.775, 6.18, 7.182],
                       })

I want to be able to create a table that looks like as follows:

Manufacturer    Min      Max      Average      Median      90th      95th      99th
Buick           3.405    210      4.62         4.425       5.04      5.83      7.182
Chrysler        3.72     43       4.193        4.82        5.775     6.18      7.75
Ford            2.6323   100.29   4.98         3.28        5.34      6.36      11.01

The code to do so looks something as follows (when hard-coded):

import plotly.graph_objects as go 
go.Figure(go.Table(
        header=dict(
            values=["Manufacturer", "Min", "Max",
                    "Average", "Median", "90th",
                    "95th", "99th"],
            font=dict(size=10),
            align="left"
        ),
        cells=dict(
            values=[['Buick', 'Ford', 'Chrysler'],    # Headers (could change based on the source file)
                    [3.405, 3.72, 2.6323],            # Min values
                    [210, 43, 100.29],                # Max values
                    [4.62, 4.193, 4.98],              # Average values
                    [4.425, 4.82, 3.28],              # Median values
                    [5.04, 5.775, 5.34],              # 90th percentile values
                    [5.83, 6.18, 6.36],               # 95th percentile values
                    [7.182, 7.75, 11.01]              # 99th percentile values
                   ],
            align = "left")                                     
))

enter image description here

According to the docs at https://plotly.com/python/table/, the cells argument argument expects a list of lists and can take a Pandas data frame (GREAT!).

Using the example in the docs, the code to pass a Pandas data frame would look something as follows:

# THIS IS THE EXAMPLE FROM THE DOCS (SHOWING THE USE OF A DATA FRAME)
fig = go.Figure(data=[go.Table(
    header=dict(values=list(df.columns),
                fill_color='paleturquoise',
                align='left'),
    cells=dict(values=[df.Rank, df.State, df.Postal, df.Population],
               fill_color='lavender',
               align='left'))
])

My most valiant attempt at this failed:

Filter by only the 'Score' records:

test_df_subset = test_df[(test_df['Metric'] == 'Score') & (test_df['Manufacturer'].isin(['Buick', 'Ford', 'Chrysler']))]

Create a pivot table:

temp_df = pd.pivot_table(data=test_df_subset,index=['Statistic', 'Manufacturer']) 

Unstack the pivot table:

temp_df.unstack(0)

Question: how would I reshape my test_df data frame to be able to pass it to the data and cells arguments in the go.Figure() function?

Thanks in advance!

Upvotes: 1

Views: 286

Answers (1)

Ben.T
Ben.T

Reputation: 29635

you were pretty close, here is one way

import plotly.graph_objects as go 

cols_ = ["Manufacturer", "Min", "Max",
         "Average", "Median", "90th",
         "95th", "99th"]
manufacturers = ['Buick', 'Ford', 'Chrysler']

#this is what you are looking for
df_ = (test_df[test_df['Manufacturer'].isin(manufacturers)]
              .set_index(['Manufacturer', 'Statistic'])
               ['Value'].unstack()
               .reset_index()[cols_]
               )

go.Figure(go.Table(
        header=dict(
            values=cols_,
            font=dict(size=10),
            align="left"
        ),
        cells=dict(
            values=df_.T, # note the T here
            align = "left")                                     
))

Compare to your method, I think df_ (in my notation) is equivalent to temp_df.unstack(0)['Value'].reset_index()[cols_] with your notation and using cols_ to order them as expected

Upvotes: 1

Related Questions