omryjs
omryjs

Reputation: 29

numpy doesn't recognize data types in conversion

I would be grateful if you could help me with a solution you gave a while back in the link below: Converting a list of ints, tuples into an numpy array

as you may recall you explained a method of converting a tuple to a numpy array. I'm working on a project of a data mining nature and I found out that the most fastest way to collect the data is by using tuples but for more then just recording input I need a numpy array. so I looked up your solution and in kinda worked - the problem is with data types. I have a tuple that looks like this :

t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

and when I try to modify your code like so

A = np.array([tuple(i) for i in t1],dtype=[('ReportTime',datetime.datetime.__class__),('activity',str.__class__)])

the numpy doesn't recognize the data types. am I putting the wrong data types? thank you for your time

Upvotes: 1

Views: 170

Answers (2)

ericmjl
ericmjl

Reputation: 14714

Since you're working on a project of a datamining nature, have you considered using Pandas instead?

Here's an example of how I can convert a list of tuples into a Pandas dataframe. I've highlighted a few common newbie errors I made when I first started out with Pandas, to give you an idea of what you can do and cannot do.

In [1]: import pandas as pd

In [2]: data = [(1, 2), (1, 5), (2, 3), (2, 2)]

In [3]: pd.datafr                         

In [3]: pd.DataFrame(data)
Out[3]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [4]: pd.columns[0] = 'column 1'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-c313e6b0cb87> in <module>()
----> 1 pd.columns[0] = 'column 1'

AttributeError: 'module' object has no attribute 'columns'

In [5]: df = pd.DataFrame(data)

In [6]: df
Out[6]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [7]: df.columns
Out[7]: Int64Index([0, 1], dtype=int64)

In [8]: df.columns[1] = "column 2"
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-8-76ee806aec72> in <module>()
----> 1 df.columns[1] = "column 2"

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.12.0-py2.7-macosx-10.6-intel.egg/pandas/core/index.pyc in __setitem__(self, key, value)
    328 
    329     def __setitem__(self, key, value):
--> 330         raise Exception(str(self.__class__) + ' object is immutable')
    331 
    332     def __getitem__(self, key):

Exception: <class 'pandas.core.index.Int64Index'> object is immutable

In [9]: df.columns = ["column 1", "column 2"]

In [10]: df
Out[10]: 
   column 1  column 2
0         1         2
1         1         5
2         2         3
3         2         2

In [11]: exit()

Specifically with your example:

In [1]: import pandas as pd

In [3]: import datetime

In [4]: t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [5]: t1
Out[5]: 
[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],
 [datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],
 [datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [6]: df = pd.DataFrame(t1)

In [7]: df
Out[7]: 
                    0       1
0 2013-10-01 20:54:51    last
1 2013-08-01 20:54:51   First
2 2013-09-02 20:54:51  second

Upvotes: 3

Iguananaut
Iguananaut

Reputation: 23356

Don't use .__class__? If you're unsure, just look at what that actually does:

>>> import datetime
>>> datetime.datetime.__class__
<class 'type'>
>>> str.__class__
<class 'type'>

datetime.datetime and str are already classes, essentially, that you can pass to Numpy for it to determine the appropriate dtype for that class (if in fact it does have a dtype associated with those classes, which should work for datetime.datetime and for str).

str.__class__ on the other hand, is the class of the class str (Python classes are objects too). The class of most classes is type unless it was defined with a custom metaclass.

Upvotes: 1

Related Questions