Trenton McKinney
Trenton McKinney

Reputation: 62453

How to plot multi-index, categorical data?

Given the following data:

DC,Mode,Mod,Ven,TY1,TY2,TY3,TY4,TY5,TY6,TY7,TY8
Intra,S,Dir,C1,False,False,False,False,False,True,True,False
Intra,S,Co,C1,False,False,False,False,False,False,False,False
Intra,M,Dir,C1,False,False,False,False,False,False,True,False
Inter,S,Co,C1,False,False,False,False,False,False,False,False
Intra,S,Dir,C2,False,True,True,True,True,True,True,False
Intra,S,Co,C2,False,False,False,False,False,False,False,False
Intra,M,Dir,C2,False,False,False,False,False,False,False,False
Inter,S,Co,C2,False,False,False,False,False,False,False,False
Intra,S,Dir,C3,False,False,False,False,True,True,False,False
Intra,S,Co,C3,False,False,False,False,False,False,False,False
Intra,M,Dir,C3,False,False,False,False,False,False,False,False
Inter,S,Co,C3,False,False,False,False,False,False,False,False
Intra,S,Dir,C4,False,False,False,False,False,True,False,True
Intra,S,Co,C4,True,True,True,True,False,True,False,True
Intra,M,Dir,C4,False,False,False,False,False,True,False,True
Inter,S,Co,C4,True,True,True,False,False,True,False,True
Intra,S,Dir,C5,True,True,False,False,False,False,False,False
Intra,S,Co,C5,False,False,False,False,False,False,False,False
Intra,M,Dir,C5,True,True,False,False,False,False,False,False
Inter,S,Co,C5,False,False,False,False,False,False,False,False

Imports:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

To reproduce my DataFrame, copy the data then use:

df = pd.read_clipboard(sep=',')

I'd like to create a plot conveying the same information as my example, but not necessarily with the same shape (I'm open to suggestions). I'd also like to hover over the color and have the appropriate Ven displayed (e.g. C1, not 1).:

Edit 2018-10-17:

The two solutions provided so far, are helpful and each accomplish a different aspect of what I'm looking for. However, the key issue I'd like to resolve, which wasn't explicitly stated prior to this edit, is the following:

I would like to perform the plotting without converting Ven to an int; this numeric transformation isn't practical with the real data. So the actual scope of the question is to plot all categorical data with two categorical axes.

enter image description here

The issue I'm experiencing is the data is categorical and the y-axis is multi-indexed.

I've done the following to transform the DataFrame:

# replace False witn nan
df = df.replace(False, np.nan)

# replace True with a number representing Ven (e.g. C1 = 1)    
def rep_ven(row):
    return row.iloc[4:].replace(True, int(row.Ven[1]))

df.iloc[:, 4:] = df.apply(rep_ven, axis=1)

# drop the Ven column
df = df.drop(columns=['Ven'])

# set multi-index
df_m = df.set_index(['DC', 'Mode', 'Mod'])

Plotting the transformed DataFrame produces:

plt.figure(figsize=(20,10))
heatmap = plt.imshow(df_m)
plt.xticks(range(len(df_m.columns.values)), df_m.columns.values)
plt.yticks(range(len(df_m.index)), df_m.index)
plt.show()

enter image description here

This plot isn't very streamlined, there are four axis values for each Ven. This is a subset of data, so the graph would be very long with all the data.

Upvotes: 0

Views: 457

Answers (2)

rje
rje

Reputation: 6428

Here's my solution. Instead of plotting I just apply a style to the DataFrame, see https://pandas.pydata.org/pandas-docs/stable/style.html

# Transform Ven values from "C1", "C2" to 1, 2, ..
df['Ven'] = df['Ven'].str[1]

# Given a specific combination of dc, mode, mod, ven, 
# do we have any True cells?
g = df.groupby(['DC', 'Mode', 'Mod', 'Ven']).any()

# Let's drop any rows with only False values
g = g[g.any(axis=1)]

# Convert True, False to 1, 0
g = g.astype(int)

# Get the values of the ven index as an int array
# Note: we don't want to drop the ven index!!
# Otherwise styling won't work
ven = g.index.get_level_values('Ven').values.astype(int)

# Multiply 1 and 0 with Ven value
g = g.mul(ven, axis=0)

# Sort the index
g.sort_index(ascending=False, inplace=True)

# Now display the dataframe with styling

# first we get a color map
import matplotlib
cmap = matplotlib.cm.get_cmap('tab10')

def apply_color_map(val):
    # hide the 0 values
    if val == 0:
        return 'color: white; background-color: white' 
    else:
        # for non-zero: get color from cmap, convert to hexcode for css
        s = "color:white; background-color: " + matplotlib.colors.rgb2hex(cmap(val))
        return s
g
g.style.applymap(apply_color_map)

The available matplotlib colormaps can be seen here: Colormap reference, with some additional explanation here: Choosing a colormap

The result

Upvotes: 1

Ernest Li
Ernest Li

Reputation: 128

Explanation: Remove rows where TY1-TY8 are all nan to create your plot. Refer to this answer as a starting point for creating interactive annotations to display Ven.

The below code should work:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_clipboard(sep=',')

# replace False witn nan
df = df.replace(False, np.nan)

# replace True with a number representing Ven (e.g. C1 = 1)    
def rep_ven(row):
    return row.iloc[4:].replace(True, int(row.Ven[1]))

df.iloc[:, 4:] = df.apply(rep_ven, axis=1)

# drop the Ven column
df = df.drop(columns=['Ven'])

idx = df[['TY1','TY2', 'TY3', 'TY4','TY5','TY6','TY7','TY8']].dropna(thresh=1).index.values
df = df.loc[idx,:].sort_values(by=['DC', 'Mode','Mod'], ascending=False)

# set multi-index
df_m = df.set_index(['DC', 'Mode', 'Mod'])


plt.figure(figsize=(20,10))
heatmap = plt.imshow(df_m)
plt.xticks(range(len(df_m.columns.values)), df_m.columns.values)
plt.yticks(range(len(df_m.index)), df_m.index)
plt.show()

enter image description here

Upvotes: 1

Related Questions