kamfulebu
kamfulebu

Reputation: 229

'data type not understood' a mistake in defining the list of column names

I'm trying to assign column names using np.dtype

I have defined a list of names

print fieldNameList

[u'A', u'B', u'C', u'D', u'E', u'F', u'G', u'H', u'I', u'J', u'K', u'L', u'M', u'N', u'S']

Then, array to string

field_name = ', '.join(["('%s', '<f8')" % w for w in fieldNameList])

print field_name

('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8'), ('F', '<f8'), ('G', '<f8'), ('H', '<f8'), ('I', '<f8'), ('J', '<f8'), ('K', '<f8'), ('L', '<f8'), ('M', '<f8'), ('N', '<f8'), ('S', '<f8')

Then

inarray = np.array(tup1,
                np.dtype([field_name]))

I get an error

np.dtype([field_name]))
TypeError: data type not understood

When instead of a variable enter generated field_name get the desired result

inarray = np.array(tup1,
            np.dtype([('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8'), ('F', '<f8'), ('G', '<f8'), ('H', '<f8'), ('I', '<f8'), ('J', '<f8'), ('K', '<f8'), ('L', '<f8'), ('M', '<f8'), ('N', '<f8'), ('S', '<f8')]))

The number and names of columns depend on the input table. It defines user. Why can not the number and names of columns defined in the script.

Does anyone have an idea how to solve this problem? Thanks in advance

Upvotes: 3

Views: 5706

Answers (2)

moooeeeep
moooeeeep

Reputation: 32512

I just stumbled accross this issue myself.

When you define a field name from a unicode object like this, you receive an error (as explained in the other answer):

>>> np.dtype([(u'foo', 'f')])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: data type not understood

Interestingly, when you create the same dtype object using the dictionary method, it works:

>>> np.dtype({'names': [u"foo"], 'formats': ["f"]})
dtype([(u'foo', '<f4')])

For the record: I'm using Python 2.7.6, with numpy 1.13.1. This issue doesn't exist with Python 3.4.3.

Here is the corresponding entry in the github numpy issue tracker: https://github.com/numpy/numpy/issues/2407

Upvotes: 1

unutbu
unutbu

Reputation: 879601

>>> field_name = ', '.join(["('%s', '<f8')" % w for w in fieldNameList])
>>> field_name
"('A', '<f8'), ('B', '<f8'), ('C', '<f8')"

makes field_name a string. [field_name] is a list containing one string. Instead, the NumPy dtype can be specified as a list of tuples:

>>> [(w, '<f8') for w in fieldNameList]
[('A', '<f8'), ('B', '<f8'), ('C', '<f8')]

fieldNameList = [u'A', u'B', u'C']
fieldNameList = [name.encode('utf-8') for name in fieldNameList]        # 1
tup1 = [(1,2,3)]
inarray = np.array(tup1, dtype=[(w, '<f8') for w in fieldNameList])

yields

array([(1.0, 2.0, 3.0)], 
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

  1. Note that fieldNameList must be a list of byte strings -- not unicode. If fieldNameList is a list of unicodes then you'll need to encode them first.

Upvotes: 2

Related Questions