CromeX
CromeX

Reputation: 445

Python Numpy.loadtxt with varied string entries but know line format

Busy looking into the limits of loadtxt specifically. I have a multi-dimensional array:

# Sample header for python loadtxt
Very random text:¤mixed with¤strings¤numbers
300057¤9989¤34956¤1
110087¤9189¤24466¤4
# EOF

I can read this all in as a string (unknown length) and then convert to integers and floats later. This I have here:

import numpy as np
txtdata = np.loadtxt('Mytxtfile.txt',delimiter=chr(164),comments="#",dtype='str')

However I would like to know if it is possible to extract, directly into a multidimensional array. Such as:

>>> 
[['Very random text:','mixed with','strings','numbers']
 [300057,9989,34956,1]
 [110087, 9189, 24466, 4]]

I tried this dtype command with no success:

dtype=[('a', 'str'),('b','int'),('c','int')]

Upvotes: 1

Views: 7362

Answers (1)

unutbu
unutbu

Reputation: 879103

txtdata = np.loadtxt(
    'Mytxtfile.txt', delimiter=chr(164), comments="#", skiprows=1,
    dtype=[('a', '|S6'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4')])

Your sample data shows 4 columns, so to specify the dtype explicitly, you would need something like:

dtype=[('a', '|S6'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4')]

Note that NumPy does not have a variable-width 'str' dtype. You have to specify the number of bytes in advance. For example, '|S6' specifies a 6-byte string dtype.

If you do not know in advance how many bytes may be in the string column(s), then it may be more convenient to use numpy.genfromtxt:

txtdata = np.genfromtxt('Mytxtfile.txt', delimiter=chr(164), comments="#",
                        names=True, dtype=None)

dtype=None tells genfromtxt to make an intelligent guess for the dtype.

Upvotes: 2

Related Questions