Reputation: 105
I have a Python Pandas dataframe in the following format:
gender | disease1 | disease2 |
---|---|---|
male | 0.82 | 0.76 |
female | 0.75 | 0.93 |
...... | .... | .... |
I'm looking to plot this in Python (matplotlib, or plotly express, etc.) so that it looks like something this:
How can I restructure my dataframe and/or use a python visualisation library to achieve this result?
Upvotes: 1
Views: 873
Reputation: 62383
seaborn.catplot
with kind='swarm'
or kind='strip'
.
seaborn
is a high-level API for matplotlib
'swarm'
draws a categorical scatterplot with non-overlapping points, but if there are many points, consider using 'strip'
.pandas.DataFrame.melt
, and then plot.
python 3.8.11
, pandas 1.3.2
, matplotlib 3.4.3
, seaborn 0.11.2
import pandas as pd
import numpy as np # only for sample data
import seaborn as sns
np.random.seed(365)
rows = 200
data = {'Gender': np.random.choice(['Male', 'Female'], size=rows),
'Cancer': np.random.rand(rows).round(2),
'Covid-19': np.random.rand(rows).round(2)}
df = pd.DataFrame(data)
# display(df.head())
Gender Cancer Covid-19
0 Male 0.82 0.88
1 Male 0.02 0.95
2 Female 0.28 0.92
3 Female 0.55 0.28
4 Male 0.15 0.46
# convert to long form
data = df.melt(id_vars='Gender', var_name='Disease')
# display(data.head())
Gender Disease value
0 Male Cancer 0.82
1 Male Cancer 0.02
2 Female Cancer 0.28
3 Female Cancer 0.55
4 Male Cancer 0.15
# plot
sns.catplot(data=data, x='Disease', y='value', hue='Gender', kind='swarm', palette=['blue', 'pink'], s=4)
Upvotes: 1
Reputation: 19545
You can create a scatterplot in Plotly where disease1
is located at x=0 and disease2
is located at x=1... and so on for more diseases, then rename the tickmarks, and set the color and offset of the marker depending on the gender.
The most dynamic way to make this plot is to add the data as you slice the DataFrame by disease and gender (I added some more points to your DataFrame to demonstrate that you can keep your DataFrame in the same format and achieve the desired plot):
import pandas as pd
import plotly.graph_objects as go
df = pd.DataFrame({'gender':['male','female','male','female'],'disease1':[0.82,0.75,0.60,0.24],'disease2':[0.76,0.93,0.51,0.44]})
fig = go.Figure()
offset = {'male': -0.1, 'female': 0.1}
marker_color_dict = {'male': 'teal', 'female':'pink'}
## set yaxis range
values = df[['disease1','disease2']].values.reshape(-1)
padding = 0.1
fig.update_yaxes(range=[min(values) - padding, 1.0])
for gender in ['male','female']:
for i, disease in enumerate(['disease1','disease2']):
## ensure that
if gender == 'male' and i == 0:
showlegend=True
elif gender == 'female' and i == 0:
showlegend=True
else:
showlegend=False
fig.add_trace(go.Scatter(
x=[i + offset[gender]]*len(df.loc[df['gender'] == gender, 'disease1'].values),
y=df.loc[df['gender'] == gender, disease].values,
mode='markers',
marker=dict(color=marker_color_dict[gender], size=20),
legendgroup=gender,
name=gender,
showlegend=showlegend
))
fig.update_layout(
xaxis = dict(
tickmode = 'array',
tickvals = [0.0,1.0],
ticktext = ['disease1','disease2']
)
)
fig.show()
Upvotes: 2