Reputation: 313
I am stuck (and in a bit of a time crunch) and was hoping for some help. This is probably a simple task but I can't seem to solve it..
I have a matrix, say 5 by 5, with an additional starting column of names for the rows and the same names for the columns in a text file like this:
b e a d c
b 0.0 0.1 0.3 0.2 0.5
e 0.1 0.0 0.4 0.9 0.3
a 0.3 0.4 0.0 0.7 0.6
d 0.2 0.9 0.7 0.0 0.1
c 0.5 0.3 0.6 0.1 0.0
I have multiple files that have the same format and size of matrix but the order of the names are different. I need a way to change these around so they are all the same and maintain the 0.0 diagonal. So any swapping I do to the columns I must do to the rows.
I have been searching a bit and it seems like NumPy might do what I want but I have never worked with it or arrays in general. Any help is greatly appreciated!
In short: How do I get a text file into an array which I can then swap around rows and columns to a desired order?
Upvotes: 1
Views: 276
Reputation: 18292
from copy import copy
f = open('input', 'r')
data = []
for line in f:
row = line.rstrip().split(' ')
data.append(row)
#collect labels, strip empty spaces
r = data.pop(0)
c = [row.pop(0) for row in data]
r.pop(0)
origrow, origcol = copy(r), copy(c)
r.sort()
c.sort()
newgrid = []
for row, rowtitle in enumerate(r):
fromrow = origrow.index(rowtitle)
newgrid.append(range(len(c)))
for col, coltitle in enumerate(c):
#We ask this len(row) times, so memoization
#might matter on a large matrix
fromcol = origcol.index(coltitle)
newgrid[row][col] = data[fromrow][fromcol]
print "\t".join([''] + r)
clabel = c.__iter__()
for line in newgrid:
print "\t".join([clabel.next()] + line)
Upvotes: 0
Reputation: 32521
Here's the start of a horrific Numpy version (use HYRY's answer...)
import numpy as np
with open("myfile", "r") as myfile:
lines = myfile.read().split("\n")
floats = [[float(item) for item in line.split()[1:]] for line in lines[1:]]
floats_transposed = np.array(floats).transpose().tolist()
Upvotes: 0
Reputation: 97301
I suggest you use pandas:
from StringIO import StringIO
import pandas as pd
data = StringIO("""b e a d c
b 0.0 0.1 0.3 0.2 0.5
e 0.1 0.0 0.4 0.9 0.3
a 0.3 0.4 0.0 0.7 0.6
d 0.2 0.9 0.7 0.0 0.1
c 0.5 0.3 0.6 0.1 0.0
""")
df = pd.read_csv(data, sep=" ")
print df.sort_index().sort_index(axis=1)
output:
a b c d e
a 0.0 0.3 0.6 0.7 0.4
b 0.3 0.0 0.5 0.2 0.1
c 0.6 0.5 0.0 0.1 0.3
d 0.7 0.2 0.1 0.0 0.9
e 0.4 0.1 0.3 0.9 0.0
Upvotes: 4