scatter plot with multiple category so the points don't overlap

Question

I'm trying to plot two sets of data in categories, or at least using string values for the X and Y axis grid points. I've seen some examples like here, but it's using a bar graph instead of a scatter plot and I haven't figured out how to make it work. I'd like to be able to add a positive or negative offset to the points based off the trace or the data associated with each point. So for example if the Up points were moved up above the grid line and the Down points were moved just below the grid, that would be ideal. Right now you can see they over lap

    import plotly.graph_objs as go
    import pandas as pd
    
    
    data = {}
    
    data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E']
    data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D']
    data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up']
    data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8]
    
    #copy data to dataframe
    tempDF = pd.DataFrame(columns=list(data.keys()))
    for tempKey in list(data.keys()):
        tempDF[tempKey] = data[tempKey]
    
    tempDF['markers'] = len(tempDF)*[5]
    tempDF['markers'][tempDF['Direction'] == 'Down'] = len(tempDF['markers'][tempDF['Direction'] == 'Down'])*[6]
    
    tempDF['colors'] = len(tempDF)*['red']
    tempDF['colors'][tempDF['Direction'] == 'Down'] = len(tempDF['colors'][tempDF['Direction'] == 'Down'])*['blue']
    
    fig = go.Figure()
    
    for direction in ['Up', 'Down']:
        fig.add_trace(
            go.Scatter(
                mode='markers',
                x=tempDF['Tx'][tempDF['Direction'] == direction],
                y=tempDF['Rx'][tempDF['Direction'] == direction],
                # x=tempDF['Tx'],
                # y=tempDF['Rx'],
                marker_size=15,
                marker_symbol=tempDF['markers'][tempDF['Direction'] == direction],  # Triangle-up or down
                marker=dict(
                    color=tempDF['colors'][tempDF['Direction'] == direction],
                    size=20,
                    line=dict(
                        color='MediumPurple',
                        width=2
                    )
                ),
                name=direction,
                hovertemplate="%{y} <- %{x}
count: 5/10
 Pct: 10 
Dir %{name}"
    
            )
        )
    
    #set axis order
    fig.update_layout(xaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E']},
                      yaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E'][::-1]}
    
                      )
    fig.show()

Edit: as J_H suggested, I was able to map the categories to numerical values, and then add an offset to my values to move them up or down. I did this the tickvals and ticktext properties of the xaxis dictionarys in the figure layout. Doing caused another problem with the data when hovering over the points on the plot though. if the points fall exactly on the axis values (on 'A', or 'B', etc on the x axis in my example) the point will read as 'A' or 'B', but if it's offset with the numerical value, then it will show the number rather than the string. to correct this, I needed to use customdata and hovertemplate in the figure properties to set the original values back to what I wanted. here's the code and the plot that i've updated to show these changes.

import plotly.graph_objs as go
import pandas as pd
import numpy as np


data = {}
possibleCategories = ['A', 'B', 'C', 'D', 'E']
numericalValues = [1, 2, 3, 4, 5]
offset = .1
data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E']
data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D']
data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up']
data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8]
data['yValue'] = len(data['Tx'])*[-1]  # pre allocate numerical value arrays
data['xValue'] = len(data['Tx'])*[-1]
data['markers'] = len(data['Tx'])*[5]  # default marker value to be an up arrow
data['colors'] = len(data['Tx'])*["red"]  # default color to red

for tempKey in data.keys(): data[tempKey] = np.array(data[tempKey], dtype="object")  # transform all the lists into numpy arrays

# create numerical values for the categories. The Y axis will have an offset, but not the x axis
for i in range(len(data['Tx'])):
    if data['Direction'][i] == 'Up':
        data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]+offset
    else:
        data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]-offset
    data['xValue'][i] = numericalValues[possibleCategories.index(data['Tx'][i])]

# set markers and colors
downIndexs = np.where(data['Direction'] == 'Down')
data['markers'][downIndexs] = 6
data['colors'][downIndexs] = "blue"


#copy data to dataframe
tempDF = pd.DataFrame(columns=list(data.keys()))
for tempKey in list(data.keys()):
    tempDF[tempKey] = data[tempKey]

fig = go.Figure()

for direction in ['Up', 'Down']:
    fig.add_trace(
        go.Scatter(
            mode='markers',
            x=tempDF['xValue'][tempDF['Direction'] == direction],
            y=tempDF['yValue'][tempDF['Direction'] == direction],
            # x=tempDF['Tx'],
            # y=tempDF['Rx'],
            marker_size=15,
            marker_symbol=tempDF['markers'][tempDF['Direction'] == direction],  # Triangle-up or down
            marker=dict(
                color=tempDF['colors'][tempDF['Direction'] == direction],
                size=20,
                line=dict(
                    color='MediumPurple',
                    width=2
                )
            ),
            name=direction,
            customdata=np.stack((tempDF['Rx'][tempDF['Direction'] == direction], tempDF['Tx'][tempDF['Direction'] == direction], tempDF['Metric'][tempDF['Direction'] == direction]), axis=-1),
            hovertemplate="
".join([
                '%{customdata[0]} <- %{customdata[1]}',
                'metric: = %{customdata[2]}',
                'Dir: ' + direction,
                ''
            ])
        )
    )

#set axis order

fig.update_layout(
    xaxis=dict(
        tickmode='array',
        tickvals=numericalValues,
        ticktext=possibleCategories,
        range=[min(numericalValues)-1, max(numericalValues)+1],
        side='top'
    ),
    yaxis=dict(
        tickmode='array',
        tickvals=numericalValues,
        ticktext=possibleCategories,
        range=[max(numericalValues)+1, min(numericalValues)-1 ]
    ),
)

               )
fig.show()

J_H · Accepted Answer

We wish to avoid plotting one symbol atop another.

if the Up points were moved up above the grid line and the Down points were moved just below the grid, that would be ideal.

Yes, you are certainly free to do that at the app level, by munging the (x, y) values before passing them to plotly. In your example this amounts to mapping letters to numeric values, tweaking them, and passing them to the library.

For values that are not already discretized, the more general problem is to find collisions, to find data points p1 & p2 within a small distance d that should be perturbed to make the distance exceed d.

To perform this in linear rather than quadratic time, assuming some reasonable input distribution, it is enough to discretize continuous input values to a desired grid size. This lets us get away with an exact equality test, which is easier than worrying about a distance metric. Store the discretized values in a set, and perturb upon noticing a collision. Use min( ... ) - d and max( ... ) + d so it won't matter which point was above or below.

If you can use the seaborn library, a swarmplot or stripplot would be the natural approach. Perhaps you're looking for this function: https://plotly.com/python-api-reference/generated/plotly.express.strip.html

EDIT

The ord() function will map characters to ordinal values for you:

>>> for ch in 'ABC':
...     print(ch, ord(ch), ord(ch) - ord('A'))
... 
A 65 0
B 66 1
C 67 2

scatter plot with multiple category so the points don't overlap

Answers (1)

Related Questions

scatter plot with multiple category so the points don&#39;t overlap

Answers (1)

Related Questions

scatter plot with multiple category so the points don't overlap