Loading a table in numpy with row- and column-indices, like in R?

Question

I would like to load a table in numpy, so that the first row and first column would be considered text labels. Something equivalent to this R code:

read.table("filename.txt", row.header=T)

Where the file is a delimited text file like this:

   A    B    C    D
X  5    4    3    2
Y  1    0    9    9
Z  8    7    6    5

So that read in I will have an array:

[[5,4,3,2],
 [1,0,9,9],
 [8,7,6,5]]

With some sort of: rownames ["X","Y","Z"] colnames ["A","B","C","D"]

Is there such a class / mechanism?

Joe Kington · Accepted Answer

Numpy arrays aren't perfectly suited to table-like structures. However, pandas.DataFrames are.

For what you're wanting, use pandas.

For your example, you'd do

data = pandas.read_csv('filename.txt', delim_whitespace=True, index_col=0)

As a more complete example (using StringIO to simulate your file):

from StringIO import StringIO
import pandas as pd

f = StringIO("""A    B    C    D
X  5    4    3    2
Y  1    0    9    9
Z  8    7    6    5""")
x = pd.read_csv(f, delim_whitespace=True, index_col=0)

print 'The DataFrame:'
print x

print 'Selecting a column'
print x['D'] # or "x.D" if there aren't spaces in the name

print 'Selecting a row'
print x.loc['Y']

This yields:

The DataFrame:
   A  B  C  D
X  5  4  3  2
Y  1  0  9  9
Z  8  7  6  5
Selecting a column
X    2
Y    9
Z    5
Name: D, dtype: int64
Selecting a row
A    1
B    0
C    9
D    9
Name: Y, dtype: int64

Also, as @DSM pointed out, it's very useful to know about things like DataFrame.values or DataFrame.to_records() if you do need a "raw" numpy array. (pandas is built on top of numpy. In a simple, non-strict sense, each column of a DataFrame is stored as a 1D numpy array.)

For example:

In [2]: x.values
Out[2]:
array([[5, 4, 3, 2],
       [1, 0, 9, 9],
       [8, 7, 6, 5]])

In [3]: x.to_records()
Out[3]:
rec.array([('X', 5, 4, 3, 2), ('Y', 1, 0, 9, 9), ('Z', 8, 7, 6, 5)],
      dtype=[('index', 'O'), ('A', '

Loading a table in numpy with row- and column-indices, like in R?

Answers (1)

Related Questions