Magnus Persson
Magnus Persson

Reputation: 811

Reading table data (card images), with format specifier given, into Python

I am trying to read in a ascii-table into Numpy/Pandas/Astropy array/dataframe/table in Python. Each row in the table looks something like this:

  329444.6949     0.0124    -6.0124 3   97.9459 15  32507 303 7 3 4       8 2 7          HDC-13-O

The problem is that there is no clear separator/delimiter between the columns, so for some rows there is no space between two columns, like this:

  332174.9289     0.0995    -6.3039 3 1708.1601219  30501 30336 333      37 136          H2CO

From the web page it says these are called "card images". The information on the table format is described as this:

The catalog data files are composed of 80-character card images, with one card image per spectral line. The format of each card image is: FREQ, ERR, LGINT, DR, ELO, GUP, TAG, QNFMT, QN', QN" (F13.4,F8.4, F8.4, I2,F10.4, I3, I7, I4, 6I2, 6I2)

I would really like a way where I just use the format specifier given above. The only thing I found wasNumpy's genfromtxt function. However, the following does not work.

np.genfromtxt('tablename', dtype='f13.4,f8.4,f8.4,i2,f10.4,i3,i7,i4,6i2,6i2')

Anyone knows how I could read this table into Python with the use of the format specification of each column that was given?

Upvotes: 1

Views: 587

Answers (1)

Tom Aldcroft
Tom Aldcroft

Reputation: 2542

You can use the fixed-width reader in Astropy. See: http://astropy.readthedocs.org/en/latest/io/ascii/fixed_width_gallery.html#fixedwidthnoheader. This does still require you to count the columns, but you could probably write a simple parser for the dtype expression you showed.

Unlike the pandas solution above (e.g. df['FREQ'] = df.data.str[0:13]), this will automatically determine the column type and give float and int columns in your case. The pandas version results in all str type columns, which is presumably not what you want.

To quote the doc example there:

>>> from astropy.io import ascii
>>> table = """
... #1       9        19                <== Column start indexes
... #|       |         |                <== Column start positions
... #<------><--------><------------->  <== Inferred column positions
...   John   555- 1234 192.168.1.10
...   Mary   555- 2134 192.168.1.123
...    Bob   555- 4527  192.168.1.9
...    Bill  555-9875  192.255.255.255
... """
>>> ascii.read(table,
...            format='fixed_width_no_header',
...            names=('Name', 'Phone', 'TCP'),
...            col_starts=(1, 9, 19),
...            )
<Table length=4>
Name   Phone         TCP
str4    str9        str15
---- --------- ---------------
John 555- 1234    192.168.1.10
Mary 555- 2134   192.168.1.123
 Bob 555- 4527     192.168.1.9
Bill  555-9875 192.255.255.255

Upvotes: 3

Related Questions