Jakub Mitura
Jakub Mitura

Reputation: 167

Plotly: Categorical scatterplot formatting

I have problem with correctly formatting a scatterplot using pandas and plotly. I would like to achieve something similar to the plot below (created with google sheets).

enter image description here

Using exactly the same data using google colab pandas and plotly I have a completely different visualization where all points are put on a grid which makes visualizing outliers very hard.

enter image description here

How can I achieve a result similar to the plot from google sheets using python? Preferably an interactive one in plotly.

import numpy as np
import pandas as pd
import hvplot.pandas
import plotly.express as px
import matplotlib.pyplot as plt

worksheet= spreadsheet.worksheet('left_to_right') ;

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
df = pd.DataFrame(rows)
headers = df.iloc[0]

cols = list(df.columns) 

pd.options.plotting.backend = "plotly" 
df.plot(
    kind='scatter',
    x=cols[0], 
    y=cols[1:], 
    width=1500,  
    height=1000 
)

Link to dataset: https://docs.google.com/spreadsheets/d/1NOHH9dUEAhRjrl0NWq_zUIgzUdYfupJjuEaXyRmTFEY/edit?usp=sharing

Upvotes: 2

Views: 672

Answers (1)

user11989081
user11989081

Reputation: 8654

You could make a Strip Chart with Plotly Express, see the code below for an example.

import pandas as pd
import plotly.express as px

# load the data
df = pd.read_csv('Data.csv', header=None)

# prepare the data
df = df.melt(id_vars=df.columns[0])
df = df.drop(labels=['variable'], axis=1)
df.columns = ['variable', 'value']
df = df.sort_values(by='variable')
df = df.reset_index(drop=True)
df
#        variable     value
# 0     Amygdala   1.066667
# 1     Amygdala   1.057650
# 2     Amygdala   1.117117
# 3     Amygdala   1.007353
# 4     Amygdala   0.979522
#          ...       ...
# 1075  Thalamus   1.019973
# 1076  Thalamus   1.001422
# 1077  Thalamus   1.037945
# 1078  Thalamus   0.963793
# 1079  Thalamus   1.012915

# plot the data
fig = px.strip(df, x='variable', y='value', color='value', stripmode='overlay')

fig.update_layout(plot_bgcolor='white',
                  paper_bgcolor='white',
                  showlegend=False,
                  xaxis=dict(title=None, linecolor='gray', mirror=True),
                  yaxis=dict(title=None, linecolor='gray', mirror=True))

fig.show()

enter image description here

Upvotes: 3

Related Questions