Reputation: 8421
I have a dataset as below:
import pandas as pd
data = dict(Pclass=[1,1,2,2,3,3],
Survived = [0,1,0,1,0,1],
CategorySize = [80,136,97,87,372,119] )
I need to create a barchart
using plotly
in python, which is grouped by Pclass. in each group, i have 2 columns for Survived=0
and Survived=1
and in Y axis i should have the CategorySize
. Therefore, i must have 6 bars which are in 3 groups.
Here is what i have tried:
import plotly.offline as pyo
import plotly.graph_objects as go
data = [ go.Bar( x = PclassSurvived.Pclass, y = PclassSurvived.CategorySize ) ]
layout = go.Layout(title= 'Pclass-Survived', xaxis = dict(title = 'Pclass'), yaxis = dict(title = 'CategorySize'),barmode='group' )
fig = go.Figure(data = data, layout = layout)
pyo.plot( fig, filename='./Output/Pclass-Survived.html')
But, it is not what i need.
Upvotes: 1
Views: 11229
Reputation: 557
This could be easily done with Pandas
's groupby
and Plotly Express.
You should group your data by Pclass
and Survived
columns, and apply the sum aggregate function to the CategorySize
column.
This way you'll get 6 groups, with their aggregate values, and you can easily plot for each group a pair of bar charts (side-byside) thanks to the barmode
attribute (by using the 'group'
value), you can read more about it in the documentation.
The code:
import pandas as pd
import plotly.express as px
data = pd.DataFrame(
dict(
Pclass=[1, 1, 2, 2, 3, 3],
Survived=[0, 1, 0, 1, 0, 1],
CategorySize=[80, 136, 97, 87, 372, 119],
)
)
Now you group the data:
grouped_df = data.groupby(by=["Pclass", "Survived"], as_index=False).agg(
{"CategorySize": "sum"}
)
And convert the Survived
column values to strings (so plotly treat it as a discrete variable, rather than numeric variable):
grouped_df.Survived = grouped_df.Survived.map({0: "Died", 1: "Survived",})
Now, you should have:
Pclass | Survived | CategorySize | |
---|---|---|---|
0 | 1 | Died | 80 |
1 | 1 | Survived | 136 |
2 | 2 | Died | 97 |
3 | 2 | Survived | 87 |
4 | 3 | Died | 372 |
5 | 3 | Survived | 119 |
Finally, you visualize your data:
fig = px.bar(
data_frame=grouped_df,
x="Pclass",
y="CategorySize",
color="Survived",
barmode="group",
)
fig.show()
Upvotes: 6
Reputation: 61084
I'm having trouble with your sample dataset. PclassSurvived.Pclass
and PclassSurvived.CategorySize
are not defined, and it's not 100% clear to me what you would like to accomplish here. But judging by your explanations and the structure of your dataset, it seems that this could get you somewhere:
Plot 1:
Code 1:
# imports
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly.graph_objs as go
import pandas as pd
import numpy as np
data = dict(Pclass=[1,1,2,2,3,3],
Survived = [0,1,0,1,0,1],
CategorySize = [80,136,97,87,372,119] )
df=pd.DataFrame(data)
s0=df.query('Survived==0')
s1=df.query('Survived==1')
#layout = go.Layout(title= 'Pclass-Survived', xaxis = dict(title = 'Pclass'), yaxis = dict(title = 'CategorySize'),barmode='group' )
fig = go.Figure()
data=data['Pclass']
fig.add_trace(go.Bar(x=s0['Pclass'], y = s0['CategorySize'],
name='dead'
)
)
fig.add_trace(go.Bar(x=s1['Pclass'], y = s1['CategorySize'],
name='alive'
)
)
fig.update_layout(barmode='group')
fig.show()
Edit: You can produce the same plot using the plotly.offline
module like this:
Code 2:
# Import the necessaries libraries
import plotly.offline as pyo
import plotly.graph_objs as go
import pandas as pd
# Set notebook mode to work in offline
pyo.init_notebook_mode()
# data
data = dict(Pclass=[1,1,2,2,3,3],
Survived = [0,1,0,1,0,1],
CategorySize = [80,136,97,87,372,119] )
df=pd.DataFrame(data)
#
s0=df.query('Survived==0')
s1=df.query('Survived==1')
fig = go.Figure()
data=data['Pclass']
fig.add_trace(go.Bar(x=s0['Pclass'], y = s0['CategorySize'],
name='dead'
)
)
fig.add_trace(go.Bar(x=s1['Pclass'], y = s1['CategorySize'],
name='alive'
)
)
pyo.iplot(fig, filename = 'your-library')
Alternative approach with stacked bars:
Plot 2:
Code 3:
# imports
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly.graph_objs as go
import pandas as pd
import numpy as np
data = dict(Pclass=[1,1,2,2,3,3],
Survived = [0,1,0,1,0,1],
CategorySize = [80,136,97,87,372,119] )
df=pd.DataFrame(data)
s0=df.query('Survived==0')
s1=df.query('Survived==1')
#layout = go.Layout(title= 'Pclass-Survived', xaxis = dict(title = 'Pclass'), yaxis = dict(title = 'CategorySize'),barmode='group' )
fig = go.Figure()
data=data['Pclass']
fig.add_trace(go.Bar(x=s0['Pclass'], y = s0['CategorySize'],
name='dead'
)
)
fig.add_trace(go.Bar(x=s1['Pclass'], y = s1['CategorySize'],
name='alive'
)
)
df_tot = df.groupby('Pclass').sum()
annot1 = [dict(
x=xi,
y=yi,
text=str(yi),
xanchor='auto',
yanchor='bottom',
showarrow=False,
) for xi, yi in zip(df_tot.index, df_tot['CategorySize'])]
fig.update_layout(barmode='stack', annotations=annot1)
fig.show()
Upvotes: 3