Reputation: 167
I have problem with correctly formatting a scatterplot using pandas and plotly. I would like to achieve something similar to the plot below (created with google sheets).
Using exactly the same data using google colab pandas and plotly I have a completely different visualization where all points are put on a grid which makes visualizing outliers very hard.
How can I achieve a result similar to the plot from google sheets using python? Preferably an interactive one in plotly.
import numpy as np
import pandas as pd
import hvplot.pandas
import plotly.express as px
import matplotlib.pyplot as plt
worksheet= spreadsheet.worksheet('left_to_right') ;
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
df = pd.DataFrame(rows)
headers = df.iloc[0]
cols = list(df.columns)
pd.options.plotting.backend = "plotly"
df.plot(
kind='scatter',
x=cols[0],
y=cols[1:],
width=1500,
height=1000
)
Link to dataset: https://docs.google.com/spreadsheets/d/1NOHH9dUEAhRjrl0NWq_zUIgzUdYfupJjuEaXyRmTFEY/edit?usp=sharing
Upvotes: 2
Views: 672
Reputation: 8654
You could make a Strip Chart with Plotly Express, see the code below for an example.
import pandas as pd
import plotly.express as px
# load the data
df = pd.read_csv('Data.csv', header=None)
# prepare the data
df = df.melt(id_vars=df.columns[0])
df = df.drop(labels=['variable'], axis=1)
df.columns = ['variable', 'value']
df = df.sort_values(by='variable')
df = df.reset_index(drop=True)
df
# variable value
# 0 Amygdala 1.066667
# 1 Amygdala 1.057650
# 2 Amygdala 1.117117
# 3 Amygdala 1.007353
# 4 Amygdala 0.979522
# ... ...
# 1075 Thalamus 1.019973
# 1076 Thalamus 1.001422
# 1077 Thalamus 1.037945
# 1078 Thalamus 0.963793
# 1079 Thalamus 1.012915
# plot the data
fig = px.strip(df, x='variable', y='value', color='value', stripmode='overlay')
fig.update_layout(plot_bgcolor='white',
paper_bgcolor='white',
showlegend=False,
xaxis=dict(title=None, linecolor='gray', mirror=True),
yaxis=dict(title=None, linecolor='gray', mirror=True))
fig.show()
Upvotes: 3