Reputation: 783
I have a dataframe like so (the real one has 300+ rows):
cline endpt fx type colours
SF-268 96.5 1 CNS #848B9E
22 SF-268 103.3 2 CNS #848B9E
23 SF-268 60.7 3 CNS #848B9E
24 SF-268 5.0 4 CNS #848B9E
25 SF-268 8.7 5 CNS #848B9E
26 SF-268 -9.4 6 CNS #848B9E
27 SF-268 -20.7 7 CNS #848B9E
28 SNB-75 105.5 1 CNS #848B9E
29 SNB-75 94.5 2 CNS #848B9E
30 SNB-75 35.3 3 CNS #848B9E
.. ... ... .. ... ...
71 SW-620 95.6 2 Colon #468F14
72 SW-620 73.5 3 Colon #468F14
73 SW-620 4.0 4 Colon #468F14
74 SW-620 9.7 5 Colon #468F14
75 SW-620 -58.6 6 Colon #468F14
76 SW-620 -49.1 7 Colon #468F14
77 CCRF-CEM 95.8 1 Leukemia #FF041E
78 CCRF-CEM 96.6 2 Leukemia #FF041E
79 CCRF-CEM 89.2 3 Leukemia #FF041E
80 CCRF-CEM 3.5 4 Leukemia #FF041E
81 CCRF-CEM 13.7 5 Leukemia #FF041E
82 CCRF-CEM -21.3 6 Leukemia #FF041E
83 CCRF-CEM -6.6 7 Leukemia #FF041E
84 HL-60(TB) 93.9 1 Leukemia #FF041E
85 HL-60(TB) 95.3 2 Leukemia #FF041E
86 HL-60(TB) 94.0 3 Leukemia #FF041E
87 HL-60(TB) 13.3 4 Leukemia #FF041E
88 HL-60(TB) 14.6 5 Leukemia #FF041E
89 HL-60(TB) -44.0 6 Leukemia #FF041E
90 HL-60(TB) -57.0 7 Leukemia #FF041E
91 K-562 88.1 1 Leukemia #FF041E
92 K-562 97.1 2 Leukemia #FF041E
93 K-562 73.6 3 Leukemia #FF041E
94 K-562 6.6 4 Leukemia #FF041E
95 K-562 7.0 5 Leukemia #FF041E
96 K-562 -21.9 6 Leukemia #FF041E
97 K-562 -29.6 7 Leukemia #FF041E
98 MOLT-4 98.9 1 Leukemia #FF041E
99 MOLT-4 96.8 2 Leukemia #FF041E
100 MOLT-4 68.9 3 Leukemia #FF041E
I used the following examples to help me produce my code at the bottom:
I managed to get a plot, however I think the line plot connects the last y value with the first, making a straight line (image below). I'm not sure why. Any help would be appreciated. Thanks.
import csv
import numpy as np
import pandas as pd
import itertools
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
labels = []
for key, grp in dfm.groupby(['colours']):
ax = grp.plot(ax=ax,linestyle='-',marker='s',x='fx',y='endpt',c=key)
labels.append(key)
lines, _ = ax.get_legend_handles_labels()
g=[]
for i in labels:
g.append(list(co.keys())[list(co.values()).index(i)])
ax.legend(lines, g, loc='best')
Upvotes: 0
Views: 1500
Reputation: 339630
The problem is that the values on the xaxis (fx
) are not monotonically increasing. Therefore, the line jumps back as the x values jumps from 7 back to 1. To avoid this, one may insert nan
into the lists to be plotted at the positions where this jump would occur. This can be done like
g = lambda x,y: np.insert(y.astype(float), np.arange(len(x)-1)[np.diff(x) < 0]+1, np.nan)
where x
is the array of x values and y
is the array into which the nan
s are inserted. Then plotting may be performed by calling this function on the x and y values
ax.plot(g(x,x), g(x,y),marker='s')
A solution using a DataFrame is shown below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
x = range(1,8)*4
y = np.array([np.exp(-np.arange(1,8)/3.)*i+i/2. for i in np.arange(1,5)/10.]).flatten()
df = pd.DataFrame({"x":x, "y":y})
print df
fig, (ax,ax2) = plt.subplots(ncols=2)
df.plot(x='x',y='y',ax=ax,marker='s')
g = lambda x,y: np.insert(y.astype(float), np.arange(len(x)-1)[np.diff(x) < 0]+1, np.nan)
ax2.plot(g(df.x.values,df.x.values), g(df.x.values,df.y.values),marker='s')
plt.show()
A full example of grouping by colors:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
x = range(1,8)*4
y = np.array([np.exp(-np.arange(1,8)/3.)*i+i/2. for i in np.arange(1,5)/10.]).flatten()
df = pd.DataFrame({"x":x, "y":y, "colours": ["#aa0000"]*len(x)})
x2 = range(1,6)*3
y2 = np.array([np.exp(-np.arange(1,6)/2.5)*i+i/2.1 for i in np.arange(1,4)/10.]).flatten()
df2 = pd.DataFrame({"x":x2, "y":y2, "colours": ["#0000aa"]*len(x2)})
df = df.append(df2)
fig, ax = plt.subplots()
g = lambda x,y: np.insert(y.astype(float), np.arange(len(x)-1)[np.diff(x) < 0]+1, np.nan)
for key, grp in df.groupby(['colours']):
ax.plot(g(grp.x.values,grp.x.values), g(grp.x.values,grp.y.values),
marker='s', color=key, label=key)
ax.legend()
plt.show()
Upvotes: 1
Reputation: 56
Your data seems to be unsorted, it sounds like you want to sort your data by increasing x-value after grouping it:
grp.sort_values(by="fx")
Upvotes: 0