Reputation: 282
I have a question on python:
how can I print matrix nicely with headers like this:
T C G C A
[0 -2 -4 -6 -8 -10]
T [-2 1 -1 -3 -5 -7]
C [-4 -1 2 0 -2 -4]
C [-6 -3 0 1 1 -1]
A [-8 -5 -2 -1 0 2]
I'v triad to print with numpy.matrix(mat) But all I'v got was:
[[ 0 -2 -4 -6 -8 -10]
[ -2 1 -1 -3 -5 -7]
[ -4 -1 2 0 -2 -4]
[ -6 -3 0 1 1 -1]
[ -8 -5 -2 -1 0 2]]
And I also didn't succeed to add the headers.
Thanks!!!
Thank you all. I'v succeed to install pandas' but I have 2 new problems. here is my code:
import pandas as pd
col1 = [' ', 'T', 'C', 'G', 'C', 'A']
col2 = [' ', 'T', 'C', 'C', 'A']
df = pd.DataFrame(mat,index = col2, columns = col1)
print df
But I get this error:
df = pd.DataFrame(mat,index = col2, columns = col1)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 163, in __init__
copy=copy)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 224, in _init_ndarray
return BlockManager([block], [columns, index])
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 237, in __init__
self._verify_integrity()
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 313, in _verify_integrity
union_items = _union_block_items(self.blocks)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 906, in _union_block_items
raise Exception('item names overlap')
Exception: item names overlap
And when I am trying to change the letters it works:
T B G C A
0 -2 -4 -6 -8 -10
T -2 1 -1 -3 -5 -7
C -4 -1 2 0 -2 -4
C -6 -3 0 1 1 -1
A -8 -5 -2 -1 0 2
but as you can see the layout of the matrix is not quite well. How can I fix those problems?
Upvotes: 4
Views: 12031
Reputation: 339330
Numpy does not provide such a functionality out of the box.
You may look into pandas. Printing a pandas.DataFrame
usually looks quite nice.
import numpy as np
import pandas as pd
cols = ["T", "C", "S", "W", "Q"]
a = np.random.randint(0,11,size=(5,5))
df = pd.DataFrame(a, columns=cols, index=cols)
print df
will produce
T C S W Q
T 9 5 10 0 0
C 3 8 0 7 2
S 0 2 6 5 8
W 4 4 10 1 5
Q 3 8 7 1 4
If you only have pure python available, you can use the following function.
import numpy as np
def print_array(a, cols, rows):
if (len(cols) != a.shape[1]) or (len(rows) != a.shape[0]):
print "Shapes do not match"
return
s = a.__repr__()
s = s.split("array(")[1]
s = s.replace(" ", "")
s = s.replace("[[", " [")
s = s.replace("]])", "]")
pos = [i for i, ltr in enumerate(s.splitlines()[0]) if ltr == ","]
pos[-1] = pos[-1]-1
empty = " " * len(s.splitlines()[0])
s = s.replace("],", "]")
s = s.replace(",", "")
lines = []
for i, l in enumerate(s.splitlines()):
lines.append(rows[i] + l)
s ="\n".join(lines)
empty = list(empty)
for i, p in enumerate(pos):
empty[p-i] = cols[i]
s = "".join(empty) + "\n" + s
print s
c = [" ", "T", "C", "G", "C", "A"]
r = [" ", "T", "C", "C", "A" ]
a = np.random.randint(-4,15,size=(5,6))
print_array(a, c, r)
giving you
T C G C A
[ 2 5 -3 7 1 9]
T [-3 10 3 -4 8 3]
C [ 6 11 -2 2 5 1]
C [ 4 6 14 11 10 0]
A [11 -4 -3 -4 14 14]
Upvotes: 4
Reputation: 231385
Here's a quick version of adding labels with plain Python and numpy
Define a function that writes lines. Here is just prints the lines, but it could be set up to print to file, or to collect all the lines in a list and return that.
def pp(arr,lbl):
print(' ',' '.join(lbl))
for i in range(4):
print('%s %s'%(lbl[i], arr[i]))
In [65]: arr=np.arange(16).reshape(4,4)
the default display for a 2d array
In [66]: print(arr)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
In [67]: lbl=list('ABCD')
In [68]: pp(arr,lbl)
A B C D
A [0 1 2 3]
B [4 5 6 7]
C [ 8 9 10 11]
D [12 13 14 15]
Spacing is off because numpy is formatting each line separately, applying a different element width for each row. But it's a start.
It looks better with a random sample:
In [69]: arr = np.random.randint(0,25,(4,4))
In [70]: arr
Out[70]:
array([[24, 12, 12, 6],
[22, 16, 18, 6],
[21, 16, 0, 23],
[ 2, 2, 19, 6]])
In [71]: pp(arr,lbl)
A B C D
A [24 12 12 6]
B [22 16 18 6]
C [21 16 0 23]
D [ 2 2 19 6]
Upvotes: 0
Reputation: 221574
Consider a sample array -
In [334]: arr = np.random.randint(0,25,(5,6))
In [335]: arr
Out[335]:
array([[24, 8, 6, 10, 5, 11],
[11, 5, 19, 6, 10, 5],
[ 6, 2, 0, 12, 6, 17],
[13, 20, 14, 10, 18, 9],
[ 9, 4, 4, 24, 24, 8]])
We can use pandas dataframe, like so -
import pandas as pd
In [336]: print pd.DataFrame(arr,columns=list(' TCGCA'),index=list(' TCCA'))
T C G C A
24 8 6 10 5 11
T 11 5 19 6 10 5
C 6 2 0 12 6 17
C 13 20 14 10 18 9
A 9 4 4 24 24 8
Note that pandas dataframe expects headers(column IDs) and indexes for all rows and columns. So, to skip those for the first row and column, we have used the IDs with the first one being empty : ' TCGCA'
and ' TCCA'
.
Upvotes: 1