Reputation: 314
I have a dataframe with the following structure
INDEX | ANO | DISTRITO | CONCELHO | NCCO |
---|---|---|---|---|
0 | 2013.0 | Aveiro | Albergaria-a-Velha | 98 |
1 | 2013.0 | Aveiro | Albergaria-a-velha | 1 |
2 | 2013.0 | Aveiro | Anadia | 41 |
The full dataset can be found here
This data set ranges from 2013 to 2022 (ANO
), and includes 18 different districts (DISTRITO
), 278 different counties (CONCELHO
) and the number of forest fires per CONCELHO
(`NCCO´)
I'm able to produce a one step Sankey graph with this code, that I adapted from here
df = pd.read_csv('heatmap_full.csv') #generated by ingestor.py
all_nodes = df.ANO.values.tolist() + df.DISTRITO.values.tolist()
source_indices = [all_nodes.index(ANO) for ANO in df.ANO]
target_indices = [all_nodes.index(DISTRITO) for DISTRITO in df.DISTRITO]
colors = px.colors.qualitative.D3
node_colors = [np.random.choice(colors) for node in all_nodes]
fig = go.Figure(data=[go.Sankey(
# Define nodes
node = dict(
pad = 20,
thickness = 20,
line = dict(color = "black", width = 1.0),
label = all_nodes,
color = node_colors,
),
# Add links
link = dict(
source = source_indices,
target = target_indices,
value = df.NCCO,
))])
fig.update_layout(title_text="FOREST FIRES IN PORTUGAL",
height = 900,
width=1200,
font_size=18)
fig.show()
My Problem/Question
I would like to have a step after DISTRITO
for CONCELHO
appearing in the Sankey graph, but I can't figure it out.
Can I add a new trace to the figure? Do I need to treat my original dataset in another way?
Any help would be much appreciated
Disclosure This is not meant for commercial use.
Upvotes: 0
Views: 2318
Reputation: 31146
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df_in = pd.read_csv("https://raw.githubusercontent.com/vostpt/ICNF_DATA/main/heatmap_full.csv")
# too much data
df_in = df_in.sample(100)
# cleanup where same values exist in two columns
df_in["CONCELHO"] = np.where(df_in["DISTRITO"]==df_in["CONCELHO"], df_in["CONCELHO"]+"_c", df_in["CONCELHO"])
# deal with some duplicates names across source and target...
df_in["CONCELHO"] = df_in["CONCELHO"].str.capitalize()
df = df_in.groupby(["ANO","DISTRITO"], as_index=False)["NCCO"].sum().rename(columns={"ANO":"source","DISTRITO":"target","NCCO":"value"})
df["source"] = df["source"].astype(int).astype(str)
df = pd.concat([df, df_in.groupby(["DISTRITO","CONCELHO"], as_index=False)["NCCO"].sum().rename(columns={"DISTRITO":"source","CONCELHO":"target", "NCCO":"value"})])
nodes = np.unique(df[["source","target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))
go.Figure(
go.Sankey(
node={"label": nodes.index},
link={
"source": nodes.loc[df["source"]],
"target": nodes.loc[df["target"]],
"value": df["value"],
},
)
)
Upvotes: 1