Reputation: 811
I am trying to read in a ascii-table into Numpy/Pandas/Astropy array/dataframe/table in Python. Each row in the table looks something like this:
329444.6949 0.0124 -6.0124 3 97.9459 15 32507 303 7 3 4 8 2 7 HDC-13-O
The problem is that there is no clear separator/delimiter between the columns, so for some rows there is no space between two columns, like this:
332174.9289 0.0995 -6.3039 3 1708.1601219 30501 30336 333 37 136 H2CO
From the web page it says these are called "card images". The information on the table format is described as this:
The catalog data files are composed of 80-character card images, with one card image per spectral line. The format of each card image is: FREQ, ERR, LGINT, DR, ELO, GUP, TAG, QNFMT, QN', QN" (F13.4,F8.4, F8.4, I2,F10.4, I3, I7, I4, 6I2, 6I2)
I would really like a way where I just use the format specifier given above. The only thing I found wasNumpy's genfromtxt function. However, the following does not work.
np.genfromtxt('tablename', dtype='f13.4,f8.4,f8.4,i2,f10.4,i3,i7,i4,6i2,6i2')
Anyone knows how I could read this table into Python with the use of the format specification of each column that was given?
Upvotes: 1
Views: 587
Reputation: 2542
You can use the fixed-width reader in Astropy. See: http://astropy.readthedocs.org/en/latest/io/ascii/fixed_width_gallery.html#fixedwidthnoheader. This does still require you to count the columns, but you could probably write a simple parser for the dtype
expression you showed.
Unlike the pandas solution above (e.g. df['FREQ'] = df.data.str[0:13]
), this will automatically determine the column type and give float and int columns in your case. The pandas version results in all str
type columns, which is presumably not what you want.
To quote the doc example there:
>>> from astropy.io import ascii
>>> table = """
... #1 9 19 <== Column start indexes
... #| | | <== Column start positions
... #<------><--------><-------------> <== Inferred column positions
... John 555- 1234 192.168.1.10
... Mary 555- 2134 192.168.1.123
... Bob 555- 4527 192.168.1.9
... Bill 555-9875 192.255.255.255
... """
>>> ascii.read(table,
... format='fixed_width_no_header',
... names=('Name', 'Phone', 'TCP'),
... col_starts=(1, 9, 19),
... )
<Table length=4>
Name Phone TCP
str4 str9 str15
---- --------- ---------------
John 555- 1234 192.168.1.10
Mary 555- 2134 192.168.1.123
Bob 555- 4527 192.168.1.9
Bill 555-9875 192.255.255.255
Upvotes: 3