Reputation: 205
I'm trying to plot two sets of data in categories, or at least using string values for the X and Y axis grid points. I've seen some examples like here, but it's using a bar graph instead of a scatter plot and I haven't figured out how to make it work. I'd like to be able to add a positive or negative offset to the points based off the trace or the data associated with each point. So for example if the Up points were moved up above the grid line and the Down points were moved just below the grid, that would be ideal. Right now you can see they over lap
import plotly.graph_objs as go
import pandas as pd
data = {}
data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E']
data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D']
data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up']
data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8]
#copy data to dataframe
tempDF = pd.DataFrame(columns=list(data.keys()))
for tempKey in list(data.keys()):
tempDF[tempKey] = data[tempKey]
tempDF['markers'] = len(tempDF)*[5]
tempDF['markers'][tempDF['Direction'] == 'Down'] = len(tempDF['markers'][tempDF['Direction'] == 'Down'])*[6]
tempDF['colors'] = len(tempDF)*['red']
tempDF['colors'][tempDF['Direction'] == 'Down'] = len(tempDF['colors'][tempDF['Direction'] == 'Down'])*['blue']
fig = go.Figure()
for direction in ['Up', 'Down']:
fig.add_trace(
go.Scatter(
mode='markers',
x=tempDF['Tx'][tempDF['Direction'] == direction],
y=tempDF['Rx'][tempDF['Direction'] == direction],
# x=tempDF['Tx'],
# y=tempDF['Rx'],
marker_size=15,
marker_symbol=tempDF['markers'][tempDF['Direction'] == direction], # Triangle-up or down
marker=dict(
color=tempDF['colors'][tempDF['Direction'] == direction],
size=20,
line=dict(
color='MediumPurple',
width=2
)
),
name=direction,
hovertemplate="%{y} <- %{x}<br>count: 5/10<br> Pct: 10 <br>Dir %{name}<extra></extra>"
)
)
#set axis order
fig.update_layout(xaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E']},
yaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E'][::-1]}
)
fig.show()
Edit:
as J_H suggested, I was able to map the categories to numerical values, and then add an offset to my values to move them up or down. I did this the tickvals
and ticktext
properties of the xaxis
dictionarys in the figure layout. Doing caused another problem with the data when hovering over the points on the plot though. if the points fall exactly on the axis values (on 'A', or 'B', etc on the x axis in my example) the point will read as 'A' or 'B', but if it's offset with the numerical value, then it will show the number rather than the string. to correct this, I needed to use customdata
and hovertemplate
in the figure properties to set the original values back to what I wanted. here's the code and the plot that i've updated to show these changes.
import plotly.graph_objs as go
import pandas as pd
import numpy as np
data = {}
possibleCategories = ['A', 'B', 'C', 'D', 'E']
numericalValues = [1, 2, 3, 4, 5]
offset = .1
data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E']
data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D']
data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up']
data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8]
data['yValue'] = len(data['Tx'])*[-1] # pre allocate numerical value arrays
data['xValue'] = len(data['Tx'])*[-1]
data['markers'] = len(data['Tx'])*[5] # default marker value to be an up arrow
data['colors'] = len(data['Tx'])*["red"] # default color to red
for tempKey in data.keys(): data[tempKey] = np.array(data[tempKey], dtype="object") # transform all the lists into numpy arrays
# create numerical values for the categories. The Y axis will have an offset, but not the x axis
for i in range(len(data['Tx'])):
if data['Direction'][i] == 'Up':
data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]+offset
else:
data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]-offset
data['xValue'][i] = numericalValues[possibleCategories.index(data['Tx'][i])]
# set markers and colors
downIndexs = np.where(data['Direction'] == 'Down')
data['markers'][downIndexs] = 6
data['colors'][downIndexs] = "blue"
#copy data to dataframe
tempDF = pd.DataFrame(columns=list(data.keys()))
for tempKey in list(data.keys()):
tempDF[tempKey] = data[tempKey]
fig = go.Figure()
for direction in ['Up', 'Down']:
fig.add_trace(
go.Scatter(
mode='markers',
x=tempDF['xValue'][tempDF['Direction'] == direction],
y=tempDF['yValue'][tempDF['Direction'] == direction],
# x=tempDF['Tx'],
# y=tempDF['Rx'],
marker_size=15,
marker_symbol=tempDF['markers'][tempDF['Direction'] == direction], # Triangle-up or down
marker=dict(
color=tempDF['colors'][tempDF['Direction'] == direction],
size=20,
line=dict(
color='MediumPurple',
width=2
)
),
name=direction,
customdata=np.stack((tempDF['Rx'][tempDF['Direction'] == direction], tempDF['Tx'][tempDF['Direction'] == direction], tempDF['Metric'][tempDF['Direction'] == direction]), axis=-1),
hovertemplate="<br>".join([
'%{customdata[0]} <- %{customdata[1]}',
'metric: = %{customdata[2]}',
'Dir: ' + direction,
'<extra></extra>'
])
)
)
#set axis order
fig.update_layout(
xaxis=dict(
tickmode='array',
tickvals=numericalValues,
ticktext=possibleCategories,
range=[min(numericalValues)-1, max(numericalValues)+1],
side='top'
),
yaxis=dict(
tickmode='array',
tickvals=numericalValues,
ticktext=possibleCategories,
range=[max(numericalValues)+1, min(numericalValues)-1 ]
),
)
)
fig.show()
Upvotes: 1
Views: 1140
Reputation: 20450
We wish to avoid plotting one symbol atop another.
if the Up points were moved up above the grid line and the Down points were moved just below the grid, that would be ideal.
Yes, you are certainly free to do that at the app level, by munging the (x, y) values before passing them to plotly. In your example this amounts to mapping letters to numeric values, tweaking them, and passing them to the library.
For values that are not already discretized,
the more general problem is to find collisions,
to find data points p1
& p2
within a small distance d
that should be perturbed to make the distance exceed d
.
To perform this in linear rather than quadratic time,
assuming some reasonable input distribution,
it is enough to discretize continuous input values
to a desired grid size.
This lets us get away with an exact equality test,
which is easier than worrying about a distance metric.
Store the discretized values in a set
,
and perturb upon noticing a collision.
Use min( ... ) - d
and max( ... ) + d
so it won't matter which point was above or below.
If you can use the seaborn
library,
a swarmplot or stripplot would be the natural approach.
Perhaps you're looking for this
function: https://plotly.com/python-api-reference/generated/plotly.express.strip.html
EDIT
The ord()
function will map characters to ordinal values for you:
>>> for ch in 'ABC':
... print(ch, ord(ch), ord(ch) - ord('A'))
...
A 65 0
B 66 1
C 67 2
Upvotes: 1