justanotherguy
justanotherguy

Reputation: 83

Multiple opacities in Mapbox - Plotly for Python

I am currently working on a Data Visualization project.

I want to plot multiple lines (about 200k) that represent travels from one Subway Station to all the others. This is, all the subway stations should be connected by a straight line.

The color of the line doesn't really matter (it could well be red, blue, etc.), but opacity is what matters the most. The bigger the number of travels between two random stations, the more opacity of that particular line; and vice versa.

I feel I am close to the desired output, but can't figure a way to do it properly.

The DataFrame I am using (df = pd.read_csv(...)) consists of a series of columns, namely: id_start_station, id_end_station, lat_start_station, long_start_station, lat_end_station, long_end_station, number_of_journeys.

I got to extract the coordinates by coding

lons = []
lons = np.empty(3 * len(df))
lons[::3] = df['long_start_station']
lons[1::3] = df['long_end_station']
lons[2::3] = None

lats = []
lats = np.empty(3 * len(df))
lats[::3] = df['lat_start_station']
lats[1::3] = df['lat_end_station']
lats[2::3] = None

I then started a figure by:

fig = go.Figure()

and then added a trace by:

fig.add_trace(go.Scattermapbox(
        name='Journeys',
        lat=lats,
        lon=lons,
        mode='lines',
        line=dict(color='red', width=1),
        opacity= ¿?, # PROBLEM IS HERE [1]
    ))

[1] So I tried a few different things to pass a opacity term:

  1. I created a new tuple for the opacity of each trace, by:
opacity = []
opacity  = np.empty(3 * len(df))
opacity [::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [1::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [2::3] = None

and passed it into [1], but this error came out:

ValueError: 
    Invalid value of type 'numpy.ndarray' received for the 'opacity' property of scattermapbox

    The 'opacity' property is a number and may be specified as:
      - An int or float in the interval [0, 1]
  1. I then thought of passing the "opacity" term into the "color" term, by using rgba's property alpha, such as: rgba(255,0,0,0.5).

So I first created a "map" of all alpha parameters:

df['alpha'] = df['number_of_journeys'] / max(df['number_of_journeys'])

and then created a function to retrieve all the alpha parameters inside a specific color:

colors_with_opacity = []

def colors_with_opacity_func(df, empty_list):
    for alpha in df['alpha']:
      empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
      empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
      empty_list.append(None)
      

colors_with_opacity_func(df, colors_with_opacity)

and passed that into the color atribute of the Scattermapbox, but got the following error:

ValueError:
    Invalid value of type 'builtins.list' received for the 'color' property of scattermapbox.line

    The 'color' property is a color and may be specified as:
      - A hex string (e.g. '#ff0000')
      - An rgb/rgba string (e.g. 'rgb(255,0,0)')
      - An hsl/hsla string (e.g. 'hsl(0,100%,50%)')
      - An hsv/hsva string (e.g. 'hsv(0,100%,100%)')
      - A named CSS color:
            aliceblue, antiquewhite, aqua, [...] , whitesmoke,
            yellow, yellowgreen

Since it is a massive amount of lines, looping / iterating through traces will carry out performance issues.

Any help will be much appreciated. I can't figure a way to properly accomplish that.

Thank you, in advance.

EDIT 1 : NEW QUESTION ADDED

I add this question here below as I believe it can help others that are looking for this particular topic.

Following Rob's helpful answer, I managed to add multiple opacities, as specified previously.

However, some of my colleagues suggested a change that would improve the visualization of the map.

Now, instead of having multiple opacities (one for each trace, according to the value of the dataframe) I would also like to have multiple widths (according to the same value of the dataframe).

This is, following Rob's answer, I would need something like this:

BINS_FOR_OPACITY=10
opacity_a = np.geomspace(0.001,1, BINS_FOR_OPACITY)
BINS_FOR_WIDTH=10
width_a = np.geomspace(1,3, BINS_FOR_WIDTH)

fig = go.Figure()

# Note the double "for" statement that follows

for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_OPACITY, labels=opacity_a)):
    for width, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_WIDTH, labels=width_a)):
        fig.add_traces(
            go.Scattermapbox(
                name=f"{d['number_of_journeys'].mean():.2E}",
                lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
                lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
                line_width=width
                line_color="blue",
                opacity=opacity,
                mode="lines+markers",
        )
    )

However, the above is clearly not working, as it is making much more traces than it should do (I really can't explain why, but I guess it might be because of the double loop forced by the two for statements).

It ocurred to me that some kind of solution could be hidding in the pd.cut part, as I would need something like a double cut, but couldn't find a way to properly doing it.

I also managed to create a Pandas series by:

widths = pd.cut(df.["size"], bins=BINS_FOR_WIDTH, labels=width_a)

and iterating over that series, but got the same result as before (an excess of traces).

To emphasize and clarify myself, I don't need to have only multiple opacities or multiple widths, but I need to have them both and at the same time, which is what's causing me some troubles.

Again, any help is deeply thanked.

Upvotes: 0

Views: 826

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31226

  • opacity is per trace, for markers it can be done with color using rgba(a,b,c,d) but not for lines. (Same in straight scatter plots)
  • to demonstrate, I have used London Underground stations (filtered to reduce number of nodes). Plus gone to extra effort of formatting data as a CSV. JSON as source has nothing to do with solution
  • encoded to bin number_of_journeys for inclusion into a trace with a geometric progression used for calculating and opacity
  • this sample data set is generating 83k sample lines
import requests
import geopandas as gpd
import plotly.graph_objects as go
import itertools
import numpy as np
import pandas as pd
from pathlib import Path

# get geometry of london underground stations
gdf = gpd.GeoDataFrame.from_features(
    requests.get(
        "https://raw.githubusercontent.com/oobrien/vis/master/tube/data/tfl_stations.json"
    ).json()
)

# limit to zone 1 and stations that have larger number of lines going through them
gdf = gdf.loc[gdf["zone"].isin(["1","2","3","4","5","6"]) & gdf["lines"].apply(len).gt(0)].reset_index(
    drop=True
).rename(columns={"id":"tfl_id", "name":"id"})

# wanna join all valid combinations of stations...
combis = np.array(list(itertools.combinations(gdf.index, 2)))

# generate dataframe of all combinations of stations
gdf_c = (
    gdf.loc[combis[:, 0], ["geometry", "id"]]
    .assign(right=combis[:, 1])
    .merge(gdf.loc[:, ["geometry", "id"]], left_on="right", right_index=True, suffixes=("_start_station","_end_station"))
)


gdf_c["lat_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.y)
gdf_c["long_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.x)
gdf_c["lat_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.y)
gdf_c["long_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.x)

gdf_c = gdf_c.drop(
    columns=[
        "geometry_start_station",
        "right",
        "geometry_end_station",
    ]
).assign(number_of_journeys=np.random.randint(1,10**5,len(gdf_c)))

gdf_c
f = Path.cwd().joinpath("SO.csv")
gdf_c.to_csv(f, index=False)

# there's an requirement to start with a CSV even though no sample data has been provided, now we're starting with a CSV
df = pd.read_csv(f)

# makes use of ravel simpler...
df["none"] = None

# now it's simple to generate scattermapbox... a trace per required opacity
BINS=10
opacity_a = np.geomspace(0.001,1, BINS)
fig = go.Figure()
for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS, labels=opacity_a)):
    fig.add_traces(
        go.Scattermapbox(
            name=f"{d['number_of_journeys'].mean():.2E}",
            lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
            lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
            line_color="blue",
            opacity=opacity,
            mode="lines+markers",
        )
    )

fig.update_layout(
    mapbox={
        "style": "carto-positron",
        "center": {'lat': 51.520214996769255, 'lon': -0.097792388774743},
        "zoom": 9,
    },
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
)

enter image description here

Upvotes: 1

Related Questions