Reputation: 921
I am working on a dashboard analyzing the words spoken in The Office. I’m currently stuck on one part of my project building a network graph visualizing who speaks to who for any particular episode of the show. The user is given the option to select a season, then an episode, then 2 characters for the network graph.
Here is my code so far:
import pandas as pd
import numpy as np
import dash
import os
from dash import dcc
from dash import html
import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output
import visdcc
import itertools as it
from sklearn.feature_extraction.text import CountVectorizer
#Load data
sheet_url = 'https://docs.google.com/spreadsheets/d/18wS5AAwOh8QO95RwHLS95POmSNKA2jjzdt0phrxeAE0/edit#gid=747974534'
url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
office_data = pd.read_csv(url)
office_data['season'] = 'Season ' + office_data['season'].astype(str)
office_data['episode'] = 'Episode ' + office_data['episode'].astype(str)
office_data['scene'] = 'Scene ' + office_data['scene'].astype(str)
#-----------Network Graph Prep----------#
#1.) Filter down to just data with main characters
office_data['main_ind'] = np.where(
(office_data['speaker']=='Pam')|
(office_data['speaker']=='Jan')|
(office_data['speaker']=='Kelly')|
(office_data['speaker']=='Phyllis')|
(office_data['speaker']=='Angela')|
(office_data['speaker']=='Erin')|
(office_data['speaker']=='Holly')|
(office_data['speaker']=='Meredith')|
(office_data['speaker']=='Michael')|
(office_data['speaker']=='Jim')|
(office_data['speaker']=='Kevin')|
(office_data['speaker']=='Oscar')|
(office_data['speaker']=='Stanley')|
(office_data['speaker']=='Toby')|
(office_data['speaker']=='Roy')|
(office_data['speaker']=='Ryan')|
(office_data['speaker']=='Andy')|
(office_data['speaker']=='Creed')|
(office_data['speaker']=='Darryl')|
(office_data['speaker']=='Dwight'),
1,0
)
#2.) Filter down to only scenes containing these people
size = office_data.groupby(['season','episode','scene']).size().reset_index()
sums = office_data.groupby(['season','episode','scene']).agg({'main_ind':'sum'}).reset_index()
main_metrics = pd.merge(size,sums,how='left',on=['season','episode','scene'])
main_metrics.rename(columns={0:'count'}, inplace=True )
office_data = pd.merge(office_data,main_metrics,how='left',on=['season','episode','scene'])
office_data['diff'] = office_data['count'] - office_data['main_ind_y']
data_for_ng = office_data[office_data['diff']==0]
#Create a season-character dictionary
season_character_dict = {'Season 1': ['Angela', 'Darryl', 'Dwight', 'Jan', 'Jim','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Roy','Ryan','Stanley','Toby','Todd Packer'],
'Season 2': ['Angela','Creed', 'Darryl', 'David Wallace', 'Dwight', 'Jan', 'Jim','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Roy','Ryan','Stanley','Toby','Todd Packer'],
'Season 3': ['Andy', 'Angela','Creed', 'Darryl', 'David Wallace', 'Dwight', 'Jan', 'Jim','Karen','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Roy','Ryan','Stanley','Toby','Todd Packer'],
'Season 4': ['Andy', 'Angela','Creed', 'Darryl', 'David Wallace', 'Dwight','Holly', 'Jan', 'Jim','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Roy','Ryan','Stanley','Toby'],
'Season 5': ['Andy', 'Angela','Creed', 'Darryl', 'David Wallace', 'Dwight','Erin','Holly', 'Jan', 'Jim','Karen','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Roy','Ryan','Stanley','Toby'],
'Season 6': ['Andy', 'Angela','Creed', 'Darryl', 'David Wallace', 'Dwight','Erin','Gabe','Holly','Jan', 'Jim','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Ryan','Stanley','Toby','Todd Packer'],
'Season 7': ['Andy', 'Angela','Creed', 'Darryl', 'David Wallace', 'Dwight','Erin','Gabe','Holly','Jan', 'Jim','Karen','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Ryan','Stanley','Toby','Todd Packer'],
'Season 8': ['Andy', 'Angela','Creed', 'Darryl', 'David Wallace', 'Dwight','Erin','Gabe', 'Jim','Kelly','Kevin','Meredith','Oscar','Pam','Phyllis','Ryan','Stanley','Toby','Todd Packer'],
'Season 9': ['Andy', 'Angela','Creed', 'Darryl', 'David Wallace', 'Dwight','Erin','Gabe','Jan','Jim','Kelly','Kevin','Meredith','Michael','Oscar','Pam','Phyllis','Roy','Ryan','Stanley','Toby','Todd Packer']
}
season_episode_dict = {'Season 1': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6'],
'Season 2': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14', 'Episode 15', 'Episode 16', 'Episode 17','Episode 18','Episode 19', 'Episode 20', 'Episode 21', 'Episode 22'],
'Season 3': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14', 'Episode 15', 'Episode 16', 'Episode 17','Episode 18','Episode 19', 'Episode 20', 'Episode 21', 'Episode 22', 'Episode 23'],
'Season 4': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14'],
'Season 5': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14', 'Episode 15', 'Episode 16', 'Episode 17','Episode 18','Episode 19', 'Episode 20', 'Episode 21', 'Episode 22', 'Episode 23','Episode 24','Episode 25','Episode 26'],
'Season 6': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14', 'Episode 15', 'Episode 16', 'Episode 17','Episode 18','Episode 19', 'Episode 20', 'Episode 21', 'Episode 22', 'Episode 23','Episode 24'],
'Season 7': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14', 'Episode 15', 'Episode 16', 'Episode 17','Episode 18','Episode 19', 'Episode 20', 'Episode 21', 'Episode 22', 'Episode 23','Episode 24'],
'Season 8': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14', 'Episode 15', 'Episode 16', 'Episode 17','Episode 18','Episode 19', 'Episode 20', 'Episode 21', 'Episode 22', 'Episode 23','Episode 24'],
'Season 9': ['Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5','Episode 6','Episode 7', 'Episode 8', 'Episode 9', 'Episode 10', 'Episode 11','Episode 12','Episode 13', 'Episode 14', 'Episode 15', 'Episode 16', 'Episode 17','Episode 18','Episode 19', 'Episode 20', 'Episode 21', 'Episode 22', 'Episode 23']
}
character_choices = office_data['speaker'].sort_values().unique()
season_choices = office_data['season'].sort_values().unique()
episode_choices = office_data['episode'].sort_values().unique()
app = dash.Dash(__name__,assets_folder=os.path.join(os.curdir,"assets"))
server = app.server
app.layout = html.Div([
dbc.Row([
dbc.Col(
dcc.Dropdown(
id='dropdown4',
options=[{'label': i, 'value': i} for i in season_choices],
value=season_choices[0]
), width=3
),
dbc.Col(
dcc.Dropdown(
id='dropdown7',
options=[{'label': i, 'value': i} for i in episode_choices],
value=episode_choices[0]
), width=3
),
dbc.Col(
dcc.Dropdown(
id='dropdown5',
options=[{'label': i, 'value': i} for i in character_choices],
value=character_choices[0]
), width=3
),
dbc.Col(
dcc.Dropdown(
id='dropdown6',
options=[{'label': i, 'value': i} for i in character_choices],
value=character_choices[1]
), width=3
)
]),
dbc.Row([
dbc.Col(
visdcc.Network(
id='net',
options = dict(
height='600px',
width='100%',
physics={'barnesHut': {'avoidOverlap': 0.5}},
maxVelocity=0,
stabilization={
'enabled': 'true',
'iterations': 15,
'updateInterval': 50,
'onlyDynamicEdges': 'false',
'fit': 'true'
},
)
)
)
])
])
@app.callback(
Output('dropdown5', 'options'),
Output('dropdown5', 'value'),
Input('dropdown4', 'value') #--> choose season
)
def set_character_options2(selected_season):
return [{'label': i, 'value': i} for i in season_character_dict[selected_season]], season_character_dict[selected_season][0],
@app.callback(
Output('dropdown6', 'options'),
Output('dropdown6', 'value'),
Input('dropdown4', 'value') #--> choose season
)
def set_character_options2(selected_season):
return [{'label': i, 'value': i} for i in season_character_dict[selected_season]], season_character_dict[selected_season][1],
@app.callback(
Output('dropdown7', 'options'), #--> filter episodes
Output('dropdown7', 'value'),
Input('dropdown4', 'value') #--> choose season
)
def set_episode_options(selected_season):
return [{'label': i, 'value': i} for i in season_episode_dict[selected_season]], season_episode_dict[selected_season][0],
@app.callback(
Output('net','data'),
Input('dropdown4','value'),
Input('dropdown7','value'),
Input('dropdown5','value'),
Input('dropdown6','value'),
)
def network(season_select, episode_select, character_select1, character_select2):
filtered = data_for_ng[['season','episode','scene','speaker']]
filtered = filtered[filtered['season']==season_select]
filtered = filtered[filtered['episode']==episode_select]
def assets_pairs(speakers):
unique_speakers = set(speakers)
if len(unique_speakers) == 1:
x = speakers.iat[0] # get the only unique asset
pairs = [[x, x]]
else:
pairs = it.permutations(unique_speakers, r=2) # get all the unique pairs without repeated elements
return pd.DataFrame(pairs, columns=['Source', 'Target'])
df_pairs = (
filtered.groupby(['season', 'episode', 'scene'])['speaker']
.apply(assets_pairs) # create asset pairs per group
.groupby(['Source', 'Target'], as_index=False) # compute the weights by
.agg(Weights = ('Source', 'size')) # counting the unique ('Source', 'Target') pairs
)
new_df = df_pairs[(df_pairs['Source']==character_select1)|(df_pairs['Source']==character_select2)]
node_list = list(
set(new_df['Source'].unique().tolist()+new_df['Target'].unique().tolist())
)
nodes = [{
'id': node_name,
'label': node_name,
#'color':#i_dont_know_what_to_put_here,
'shape':'dot',
'size':15
}
for i, node_name in enumerate(node_list)]
#Create edges from df
edges=[]
for row in new_df.to_dict(orient='records'):
source, target = row['Source'], row['Target']
edges.append({
'id':source + "__" + target,
'from': source,
'to': target,
'width': 2
})
data = {'nodes':nodes, 'edges': edges}
return data
app.run_server(host='0.0.0.0',port='8051')
The issue I’m having is that when there are lots of connections, it’s hard to see where the source nodes are (the 2 selected characters). So, I want to be able to change the color of those nodes to make these diagrams easier to interpret. However, I haven’t been able to figure out a way to change the color for specific nodes - so far it only seems possible to change the color of all nodes.
I've found the documentation page for the network function, but not sure how to implement some of the stuff I've found here: https://visjs.github.io/vis-network/docs/network/nodes.html#
Can someone help me figure out how to get this little part of the diagram working properly? Any help would be appreciated!
Thank you!
Upvotes: 1
Views: 1074
Reputation: 1200
You just need to distinguish the source and destination nodes from the other ones.
A quick fix for doing this is adding an if
condition when creating the nodes list, like so:
nodes = [
({
'id': node_name,
'label': node_name,
'color': "red",
'shape':'dot',
'size':15
})
if node_name == character_select1 or node_name == character_select2
else
({
'id': node_name,
'label': node_name,
'color': "green",
'shape':'dot',
'size':15
})
for _, node_name in enumerate(node_list)
]
This will colour the specified nodes in red and the intermediate/irrelevant ones as green.
Upvotes: 1