Jeff
Jeff

Reputation: 8421

Plotly: How to create a barchart using group by?

I have a dataset as below:

import pandas as pd
data = dict(Pclass=[1,1,2,2,3,3],
            Survived = [0,1,0,1,0,1],
            CategorySize = [80,136,97,87,372,119] )

I need to create a barchart using plotly in python, which is grouped by Pclass. in each group, i have 2 columns for Survived=0 and Survived=1 and in Y axis i should have the CategorySize. Therefore, i must have 6 bars which are in 3 groups.

Here is what i have tried:

import plotly.offline as pyo
import plotly.graph_objects as go

data = [ go.Bar( x = PclassSurvived.Pclass, y = PclassSurvived.CategorySize ) ]
layout = go.Layout(title= 'Pclass-Survived', xaxis = dict(title = 'Pclass'), yaxis = dict(title = 'CategorySize'),barmode='group' )
fig = go.Figure(data = data, layout = layout)

pyo.plot( fig, filename='./Output/Pclass-Survived.html')

But, it is not what i need.

Upvotes: 1

Views: 11229

Answers (2)

Reslan Tinawi
Reslan Tinawi

Reputation: 557

This could be easily done with Pandas's groupby and Plotly Express.

You should group your data by Pclass and Survived columns, and apply the sum aggregate function to the CategorySize column.

This way you'll get 6 groups, with their aggregate values, and you can easily plot for each group a pair of bar charts (side-byside) thanks to the barmode attribute (by using the 'group' value), you can read more about it in the documentation.

The code:

import pandas as pd
import plotly.express as px

data = pd.DataFrame(
    dict(
        Pclass=[1, 1, 2, 2, 3, 3],
        Survived=[0, 1, 0, 1, 0, 1],
        CategorySize=[80, 136, 97, 87, 372, 119],
    )
)

Now you group the data:

grouped_df = data.groupby(by=["Pclass", "Survived"], as_index=False).agg(
    {"CategorySize": "sum"}
)

And convert the Survived column values to strings (so plotly treat it as a discrete variable, rather than numeric variable):

grouped_df.Survived = grouped_df.Survived.map({0: "Died", 1: "Survived",})

Now, you should have:

Pclass Survived CategorySize
0 1 Died 80
1 1 Survived 136
2 2 Died 97
3 2 Survived 87
4 3 Died 372
5 3 Survived 119

Finally, you visualize your data:

fig = px.bar(
    data_frame=grouped_df,
    x="Pclass",
    y="CategorySize",
    color="Survived",
    barmode="group",
)

fig.show()

enter image description here

Upvotes: 6

vestland
vestland

Reputation: 61084

I'm having trouble with your sample dataset. PclassSurvived.Pclass and PclassSurvived.CategorySize are not defined, and it's not 100% clear to me what you would like to accomplish here. But judging by your explanations and the structure of your dataset, it seems that this could get you somewhere:

Plot 1:

enter image description here

Code 1:

# imports
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly.graph_objs as go
import pandas as pd
import numpy as np

data = dict(Pclass=[1,1,2,2,3,3],
            Survived = [0,1,0,1,0,1],
            CategorySize = [80,136,97,87,372,119] )
df=pd.DataFrame(data)

s0=df.query('Survived==0')
s1=df.query('Survived==1')

#layout = go.Layout(title= 'Pclass-Survived', xaxis = dict(title = 'Pclass'), yaxis = dict(title = 'CategorySize'),barmode='group' )
fig = go.Figure()

data=data['Pclass']

fig.add_trace(go.Bar(x=s0['Pclass'], y = s0['CategorySize'],
                    name='dead'
                    )
             )

fig.add_trace(go.Bar(x=s1['Pclass'], y = s1['CategorySize'],
                    name='alive'
                    )
             )

fig.update_layout(barmode='group')
fig.show()

Edit: You can produce the same plot using the plotly.offline module like this:

Code 2:

# Import the necessaries libraries
import plotly.offline as pyo
import plotly.graph_objs as go
import pandas as pd

# Set notebook mode to work in offline
pyo.init_notebook_mode()

# data
data = dict(Pclass=[1,1,2,2,3,3],
            Survived = [0,1,0,1,0,1],
            CategorySize = [80,136,97,87,372,119] )
df=pd.DataFrame(data)

# 
s0=df.query('Survived==0')
s1=df.query('Survived==1')

fig = go.Figure()

data=data['Pclass']

fig.add_trace(go.Bar(x=s0['Pclass'], y = s0['CategorySize'],
                    name='dead'
                    )
             )

fig.add_trace(go.Bar(x=s1['Pclass'], y = s1['CategorySize'],
                    name='alive'
                    )
             )

pyo.iplot(fig, filename = 'your-library')

Alternative approach with stacked bars:

Plot 2:

enter image description here

Code 3:

# imports
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly.graph_objs as go
import pandas as pd
import numpy as np

data = dict(Pclass=[1,1,2,2,3,3],
            Survived = [0,1,0,1,0,1],
            CategorySize = [80,136,97,87,372,119] )
df=pd.DataFrame(data)

s0=df.query('Survived==0')
s1=df.query('Survived==1')

#layout = go.Layout(title= 'Pclass-Survived', xaxis = dict(title = 'Pclass'), yaxis = dict(title = 'CategorySize'),barmode='group' )
fig = go.Figure()

data=data['Pclass']

fig.add_trace(go.Bar(x=s0['Pclass'], y = s0['CategorySize'],
                    name='dead'
                    )
             )

fig.add_trace(go.Bar(x=s1['Pclass'], y = s1['CategorySize'],
                    name='alive'
                    )
             )

df_tot = df.groupby('Pclass').sum()

annot1 = [dict(
            x=xi,
            y=yi,
            text=str(yi),
            xanchor='auto',
            yanchor='bottom',
            showarrow=False,
        ) for xi, yi in zip(df_tot.index, df_tot['CategorySize'])]

fig.update_layout(barmode='stack', annotations=annot1)
fig.show()

Upvotes: 3

Related Questions