Spencer Trinh
Spencer Trinh

Reputation: 783

Multiple line plots using hex color code

I have a dataframe like so (the real one has 300+ rows):

        cline    endpt  fx     type  colours 
        SF-268   96.5   1       CNS  #848B9E
22      SF-268  103.3   2       CNS  #848B9E
23      SF-268   60.7   3       CNS  #848B9E
24      SF-268    5.0   4       CNS  #848B9E
25      SF-268    8.7   5       CNS  #848B9E
26      SF-268   -9.4   6       CNS  #848B9E
27      SF-268  -20.7   7       CNS  #848B9E
28      SNB-75  105.5   1       CNS  #848B9E
29      SNB-75   94.5   2       CNS  #848B9E
30      SNB-75   35.3   3       CNS  #848B9E
..         ...    ...  ..       ...      ...
71      SW-620   95.6   2     Colon  #468F14
72      SW-620   73.5   3     Colon  #468F14
73      SW-620    4.0   4     Colon  #468F14
74      SW-620    9.7   5     Colon  #468F14
75      SW-620  -58.6   6     Colon  #468F14
76      SW-620  -49.1   7     Colon  #468F14
77    CCRF-CEM   95.8   1  Leukemia  #FF041E
78    CCRF-CEM   96.6   2  Leukemia  #FF041E
79    CCRF-CEM   89.2   3  Leukemia  #FF041E
80    CCRF-CEM    3.5   4  Leukemia  #FF041E
81    CCRF-CEM   13.7   5  Leukemia  #FF041E
82    CCRF-CEM  -21.3   6  Leukemia  #FF041E
83    CCRF-CEM   -6.6   7  Leukemia  #FF041E
84   HL-60(TB)   93.9   1  Leukemia  #FF041E
85   HL-60(TB)   95.3   2  Leukemia  #FF041E
86   HL-60(TB)   94.0   3  Leukemia  #FF041E
87   HL-60(TB)   13.3   4  Leukemia  #FF041E
88   HL-60(TB)   14.6   5  Leukemia  #FF041E
89   HL-60(TB)  -44.0   6  Leukemia  #FF041E
90   HL-60(TB)  -57.0   7  Leukemia  #FF041E
91       K-562   88.1   1  Leukemia  #FF041E
92       K-562   97.1   2  Leukemia  #FF041E
93       K-562   73.6   3  Leukemia  #FF041E
94       K-562    6.6   4  Leukemia  #FF041E
95       K-562    7.0   5  Leukemia  #FF041E
96       K-562  -21.9   6  Leukemia  #FF041E
97       K-562  -29.6   7  Leukemia  #FF041E
98      MOLT-4   98.9   1  Leukemia  #FF041E
99      MOLT-4   96.8   2  Leukemia  #FF041E
100     MOLT-4   68.9   3  Leukemia  #FF041E

I used the following examples to help me produce my code at the bottom:

I managed to get a plot, however I think the line plot connects the last y value with the first, making a straight line (image below). I'm not sure why. Any help would be appreciated. Thanks.


import csv
import numpy as np
import pandas as pd
import itertools
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
labels = []
for key, grp in dfm.groupby(['colours']):
    ax = grp.plot(ax=ax,linestyle='-',marker='s',x='fx',y='endpt',c=key)
    labels.append(key)
lines, _ = ax.get_legend_handles_labels()
g=[]
for i in labels:
    g.append(list(co.keys())[list(co.values()).index(i)])
ax.legend(lines, g, loc='best')   

enter image description here

Upvotes: 0

Views: 1500

Answers (2)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339630

The problem is that the values on the xaxis (fx) are not monotonically increasing. Therefore, the line jumps back as the x values jumps from 7 back to 1. To avoid this, one may insert nan into the lists to be plotted at the positions where this jump would occur. This can be done like

g = lambda x,y: np.insert(y.astype(float), np.arange(len(x)-1)[np.diff(x) < 0]+1, np.nan)

where x is the array of x values and y is the array into which the nans are inserted. Then plotting may be performed by calling this function on the x and y values

ax.plot(g(x,x), g(x,y),marker='s')

A solution using a DataFrame is shown below.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

x = range(1,8)*4
y = np.array([np.exp(-np.arange(1,8)/3.)*i+i/2. for i in np.arange(1,5)/10.]).flatten()
df = pd.DataFrame({"x":x, "y":y})
print df
fig, (ax,ax2) = plt.subplots(ncols=2)

df.plot(x='x',y='y',ax=ax,marker='s')


g = lambda x,y: np.insert(y.astype(float), np.arange(len(x)-1)[np.diff(x) < 0]+1, np.nan)
ax2.plot(g(df.x.values,df.x.values), g(df.x.values,df.y.values),marker='s')
plt.show()

enter image description here

A full example of grouping by colors:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

x = range(1,8)*4
y = np.array([np.exp(-np.arange(1,8)/3.)*i+i/2. for i in np.arange(1,5)/10.]).flatten()
df = pd.DataFrame({"x":x, "y":y, "colours": ["#aa0000"]*len(x)})
x2 = range(1,6)*3
y2 = np.array([np.exp(-np.arange(1,6)/2.5)*i+i/2.1 for i in np.arange(1,4)/10.]).flatten()
df2 = pd.DataFrame({"x":x2, "y":y2, "colours": ["#0000aa"]*len(x2)})
df = df.append(df2)


fig, ax = plt.subplots()

g = lambda x,y: np.insert(y.astype(float), np.arange(len(x)-1)[np.diff(x) < 0]+1, np.nan)

for key, grp in df.groupby(['colours']):
    ax.plot(g(grp.x.values,grp.x.values), g(grp.x.values,grp.y.values),
            marker='s', color=key, label=key)

ax.legend()
plt.show()

enter image description here

Upvotes: 1

devnull
devnull

Reputation: 56

Your data seems to be unsorted, it sounds like you want to sort your data by increasing x-value after grouping it:

grp.sort_values(by="fx")

Upvotes: 0

Related Questions