Reputation: 553
People are racing at 100 m, 400 m, 1600 m tracks and their finish time is recorded. I want to present data for each racer in parallel coordinates plot. Some racers may not finish the track. In this case I would like to mark it somehow, either by an infinity point or somehow with a color for a specific track.
As an example I made a parallel coordinates plot in paint:
Lazyman hasn't finished the 1600m track and this is marked with x.
An example data set is given in the following "racing.csv":
RACER,TRACK.100m,TRACK.400m,TRACK.1500m
Superman,0.1,0.5,1
Lazyman,200,900,Inf
I have tried a solution with pandas:
import pandas
from pandas.tools.plotting import parallel_coordinates
import matplotlib.pyplot as plt
d = pandas.read_csv('racing.csv')
f = plt.figure()
parallel_coordinates(d, 'RACER')
f.axes[0].set_yscale('log')
plt.show()
This gives a plot without Inf value for Lazyman at 1600m:
I also prepared a csv for ggplot (there may be a better way to do this):
RACER,TRACK,TIME
Superman,100m,0.1
Superman,400m,0.5
Superman,1600m,1
Lazyman,100m,200
Lazyman,400m,900
Lazyman,1600m,Inf
With using ggplot:
require(ggplot2)
d <- read.csv('racing2.csv')
g <- ggplot(d) + geom_line(aes(x=TRACK,y=TIME,group=RACER, color=RACER))
g <- g + scale_y_log10()
ggsave('ggplot.png')
I got closer:
as this shows an infinity value, but doesn't make any annotation to it.
Any solution, either Python or R, will be appreciated. Also, suggestions regarding marking unfinished races are appreciated.
Upvotes: 0
Views: 480
Reputation: 13680
With R and ggplot2
:
Build some bogus data:
df <- data.frame(ID = factor(c(rep(1, 3), rep(2, 3), rep(3, 3)), labels = c('Realman', 'Lazyman', 'Superman')),
race = factor(rep(seq(1,3,1), 3), labels = c('100m', '400m', '1600m')),
runTime = c(8.9, 20.5, 150.9, 100.1, 300.3, +Inf, 1.2, 5, +Inf))
ID race runTime
# 1 Realman 100m 8.9
# 2 Realman 400m 20.5
# 3 Realman 1600m 150.9
# 4 Lazyman 100m 100.1
# 5 Lazyman 400m 300.3
# 6 Lazyman 1600m Inf
# 7 Superman 100m 1.2
# 8 Superman 400m 5.0
# 9 Superman 1600m Inf
ggplot(filter(df, runTime != +Inf), aes(x = race, y = runTime, group = ID, color = ID)) +
geom_line(size = 2) +
geom_point(size = 4) +
geom_line(data = df, linetype = 'dashed', size = 1) +
geom_point(data = df, shape = 21, size = 1) +
geom_text(aes(label = runTime), position = position_nudge(y = -.1)) +
scale_y_continuous(trans = 'log10', breaks = c(1, 10, 100, 1000)) +
scale_x_discrete('Track') +
scale_color_manual('Racer', values = brewer.pal(length(levels(df$ID)), 'Set1')) +
theme(panel.background = element_blank(),
panel.grid.major.x = element_line(colour = 'lightgrey', size = 25),
legend.position = 'top',
axis.line.y = element_line('black', .5, arrow = arrow()))
Upvotes: 2