Reputation: 461
I am trying to build a script that loop thru the range and save all of the figures of one principal component vs other principal component without duplicates and without identical vs identical, ---e.g. if I generate symmetrical matrix 3 by 3, I will have 3 meaningful figures: fig_1_2, fig_1_3 and fig_2_3. I came with that buggy solution
#!/usr/env python
import mdp
import numpy as np
import matplotlib.pyplot as plt
#
set1 = 'set1_smthing.txt'
set2 = 'set2_smthing.txt'
set3 = 'set3_smthing.txt'
bname = set1.split(".")[0].split("_")[0]
set1d = np.loadtxt(set1, delimiter=',')
set2d = np.loadtxt(set2, delimiter=',')
set3d = np.loadtxt(fchembl, delimiter=',')
set_comb = np.vstack([set1d,set2d,set3d])
# performing PCA with MDP
set_comb_pca = mdp.pca(set_comb,svd=True)
pcan = mdp.nodes.PCANode(output_dim=3)
pcar = pcan.execute(set_comb)
# graph the results - lower triangle
for i in range(1,6):
for j in range(1,6):
if i != j and i < j:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(pcar[(len(set1d)+1):(len(set1d)+len(set2d)), i], pcar[(len(set1d)+1):(len(set1d)+len(set2d)), j], marker='.', color='grey',linestyle="None")
ax.plot(pcar[(len(set1d)+len(set2d)):, i], pcar[(len(set1d)+len(set2d)):, j], marker='.', color='blue',linestyle="None")
ax.plot(pcar[1:len(set1d),i], pcar[1:len(set1d), j], marker='.', color='red',linestyle="None")
# labels and title
ax.set_xlabel('PC' + str(i) + '(%.3f%%)' % (pcan.d[i]))
ax.set_ylabel('PC' + str(j) + '(%.3f%%)' % (pcan.d[j]))
plt.title(gname)
gname = bname + "_pc" + str(i) + "_vs_" + "pc" + str(j)
plt.title(bname)
# saving image
fig.savefig(gname + ".png")
plt.close(fig)
The script produce only 1 figure PC1 vs PC2 and quit. It's seems that my bug is in the enumeration. Can you please suggest me correction? I tested it with: print gname - and everything is ok. The output of buggy script is following.
<matplotlib.text.Text object at 0x11817e10>
[<matplotlib.lines.Line2D object at 0x11814610>]
[<matplotlib.lines.Line2D object at 0xd2d7710>]
[<matplotlib.lines.Line2D object at 0xd2d7bd0>]
<matplotlib.text.Text object at 0x11812690>
<matplotlib.text.Text object at 0x11814d10>
<matplotlib.text.Text object at 0x11817e10>
<matplotlib.text.Text object at 0xd2ff090>
Traceback (most recent call last):
File "<stdin>", line 9, in <module>
IndexError: invalid index
Upvotes: 0
Views: 1562
Reputation: 4717
I can tell you are an ex-Matlab programmer. You will learn to love 0-based indexing!
You have an IndexError but it is difficult to debug because you are indexing so many different things, often multiple times on each line. Put the code in a script and run it (run main.py
in ipython or python main.py
from the terminal) and at least you will know where the error is occuring. I suspect you have an off-by-one error since you seem to be using 1-based indexing.
How about:
# Extract PCA components for each set
pca1 = pcar[:len(set1d)]
pca2 = pcar[len(set1d):len(set1d)+len(set2d)]
pca3 = pcar[-len(set3d):]
# Iterate over each pair of components
for i in range(3):
for j in range(i+1, 3):
f = plt.figure()
ax = f.add_subplot(111)
ax.plot(pca1[:, i], pca1[:, j], 'b.')
ax.plot(pca2[:, i], pca2[:, j], 'r.')
ax.plot(pca3[:, i], pca3[:, j], 'g.')
ax.set_xlabel('PC%d' % i)
ax.set_ylabel('PC%d' % j)
plt.savefig('PC%d_vs_PC%d.png' % (i, j))
plt.close(f)
By the way, in general I find it very useful to separate calculation code from plotting code. That is why I suggest separating the datasets before you enter the loops. You want your plotting code to focus on the plotting, not on complicated indexing.
Upvotes: 3