Slaven Glumac
Slaven Glumac

Reputation: 553

Parallel coordinates plot with skipped coordinates

People are racing at 100 m, 400 m, 1600 m tracks and their finish time is recorded. I want to present data for each racer in parallel coordinates plot. Some racers may not finish the track. In this case I would like to mark it somehow, either by an infinity point or somehow with a color for a specific track.

As an example I made a parallel coordinates plot in paint: enter image description here
Lazyman hasn't finished the 1600m track and this is marked with x.

An example data set is given in the following "racing.csv":

RACER,TRACK.100m,TRACK.400m,TRACK.1500m
Superman,0.1,0.5,1
Lazyman,200,900,Inf

I have tried a solution with pandas:

import pandas
from pandas.tools.plotting import parallel_coordinates
import matplotlib.pyplot as plt

d = pandas.read_csv('racing.csv')

f = plt.figure()
parallel_coordinates(d, 'RACER')
f.axes[0].set_yscale('log')

plt.show()

This gives a plot without Inf value for Lazyman at 1600m: enter image description here

I also prepared a csv for ggplot (there may be a better way to do this):

RACER,TRACK,TIME
Superman,100m,0.1
Superman,400m,0.5
Superman,1600m,1
Lazyman,100m,200
Lazyman,400m,900
Lazyman,1600m,Inf

With using ggplot:

require(ggplot2)
d <- read.csv('racing2.csv')
g <- ggplot(d) + geom_line(aes(x=TRACK,y=TIME,group=RACER, color=RACER))
g <- g + scale_y_log10()
ggsave('ggplot.png')

I got closer:

enter image description here
as this shows an infinity value, but doesn't make any annotation to it.

Any solution, either Python or R, will be appreciated. Also, suggestions regarding marking unfinished races are appreciated.

Upvotes: 0

Views: 480

Answers (1)

GGamba
GGamba

Reputation: 13680

With R and ggplot2:

Build some bogus data:

df <- data.frame(ID = factor(c(rep(1, 3), rep(2, 3), rep(3, 3)), labels = c('Realman', 'Lazyman', 'Superman')),
             race = factor(rep(seq(1,3,1), 3), labels = c('100m', '400m', '1600m')),
             runTime = c(8.9, 20.5, 150.9, 100.1, 300.3, +Inf, 1.2, 5, +Inf))

        ID  race runTime
# 1  Realman  100m     8.9
# 2  Realman  400m    20.5
# 3  Realman 1600m   150.9
# 4  Lazyman  100m   100.1
# 5  Lazyman  400m   300.3
# 6  Lazyman 1600m     Inf
# 7 Superman  100m     1.2
# 8 Superman  400m     5.0
# 9 Superman 1600m     Inf

Result:

enter image description here

Code:

ggplot(filter(df, runTime != +Inf), aes(x = race, y = runTime, group = ID, color = ID)) + 
    geom_line(size = 2) +
    geom_point(size = 4) +

    geom_line(data = df, linetype = 'dashed', size = 1) +        
    geom_point(data = df, shape = 21, size = 1) +

    geom_text(aes(label = runTime), position = position_nudge(y = -.1)) +

    scale_y_continuous(trans = 'log10', breaks = c(1, 10, 100, 1000)) +
    scale_x_discrete('Track') +
    scale_color_manual('Racer', values = brewer.pal(length(levels(df$ID)), 'Set1')) +

    theme(panel.background = element_blank(),
          panel.grid.major.x = element_line(colour = 'lightgrey', size = 25),
          legend.position = 'top',
          axis.line.y = element_line('black', .5, arrow = arrow()))

Upvotes: 2

Related Questions