Reputation: 1243
Can I "store" instances of class in pandas/numpy Series-DataFrame/ndarray just like I do in list? Or these libraries support on built-in types (numerics, strings).
For example I have Point
with x,y
coordinates, and I want to store Points
in Plane
, that would return Point
with given coordinates.
#my class
class MyPoint:
def __init__(self, x,y):
self.x = x
self.y = y
@property
def x(self):
return self.x
@property
def y(self):
return self.y
Here I create instances:
first_point = MyClass(1,1)
second_point = MyClass(2,2)
I can store instances in some list
my_list = []
my_list.append(first_point)
my_list.append(second_point)
The problem in list is that it's indexes do not correspond to x,y properties.
Dictionary/DataFrame approach:
Plane = {"x" : [first_point.x, second_point.x], "y" : [first_point.y, second_point.y], "some_reference/id_to_point_instance" = ???}
Plane_pd = pd.DataFrame(Plane)
I've read posts, that using "id" of instance as third column value in DataFrame could cause problems with the garbage collector.
Upvotes: 29
Views: 24474
Reputation: 49814
A pandas.DataFrame
will gladly store python objects.
Some test code to demonstrate...
class MyPoint:
def __init__(self, x, y):
self._x = x
self._y = y
@property
def x(self):
return self._x
@property
def y(self):
return self._y
my_list = [MyPoint(1, 1), MyPoint(2, 2)]
print(my_list)
plane_pd = pd.DataFrame([[p.x, p.y, p] for p in my_list],
columns=list('XYO'))
print(plane_pd.dtypes)
print(plane_pd)
[<__main__.MyPoint object at 0x033D2AF0>, <__main__.MyPoint object at 0x033D2B10>]
X int64
Y int64
O object
dtype: object
X Y O
0 1 1 <__main__.MyPoint object at 0x033D2AF0>
1 2 2 <__main__.MyPoint object at 0x033D2B10>
Note the two object in the list are the same two objects in the dataframe. Also note the dtype for the O
column is object
.
Upvotes: 30