Reputation: 9548
I'm trying to develop custom dataframe accessor for pandas
and faced with issue I'm not sure how to solve.
My accessor should load data from custom source and I was planning to assign these values to DataFrame
on which accessor is called. But when I'm assigning my newly created dataframe to dataframe instance I was given nothing happens.
I assume it's because of the fact I'm creating new instance of dataframe instead of reusing the old one.
Is there any graceful way how to preserve dataframe instance and load information there?
Here is my code how I'm approaching it now:
import pandas
import numpy
@pandas.api.extensions.register_dataframe_accessor("test")
class TestAccessor:
def __init__(self, obj: pandas.DataFrame) -> None:
self.data = obj
def read(self) -> None:
# Creates dataframe with three columns `X, Y, Z`
self.data = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('XYZ'))
# Creates dataframe with three columns `A, B, C`
data = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('ABC'))
# Suppose to load dataframe with columns `X, Y, Z`
data.test.read()
# Will show dataframe with columns `A, B, C`
print (data)
Is there way how that could be fixed? What would be the best way to approach this problem?
Upvotes: 1
Views: 160
Reputation: 30589
Note sure if this really makes much sense in practice, but here is a solution to achieve what you want in your example: drop all existing columns inplace and assign the new columns:
import pandas
import numpy
@pandas.api.extensions.register_dataframe_accessor("test")
class TestAccessor:
def __init__(self, obj: pandas.DataFrame) -> None:
self.data = obj
def read(self) -> None:
# Creates dataframe with three columns `X, Y, Z`
self.data.drop(columns=self.data.columns, inplace=True)
new = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('XYZ'))
self.data[new.columns] = new
# Creates dataframe with three columns `A, B, C`
data = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('ABC'))
# Suppose to load dataframe with columns `X, Y, Z`
data.test.read()
# Now shows dataframe with columns `X, Y, Z`
print (data)
Output:
X Y Z
0 30 86 16
1 33 93 33
2 43 62 95
3 24 74 5
4 52 68 95
.. .. .. ..
95 89 54 90
96 35 78 20
97 68 11 17
98 29 68 44
99 33 73 11
[100 rows x 3 columns]
Upvotes: 1