shytikov
shytikov

Reputation: 9548

How to add data to pandas.DataFrame without recreating the instance

I'm trying to develop custom dataframe accessor for pandas and faced with issue I'm not sure how to solve.

My accessor should load data from custom source and I was planning to assign these values to DataFrame on which accessor is called. But when I'm assigning my newly created dataframe to dataframe instance I was given nothing happens.

I assume it's because of the fact I'm creating new instance of dataframe instead of reusing the old one.

Is there any graceful way how to preserve dataframe instance and load information there?

Here is my code how I'm approaching it now:

import pandas
import numpy


@pandas.api.extensions.register_dataframe_accessor("test")
class TestAccessor:
    def __init__(self, obj: pandas.DataFrame) -> None:
        self.data = obj

    def read(self) -> None:
        # Creates dataframe with three columns `X, Y, Z`
        self.data = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('XYZ'))


# Creates dataframe with three columns `A, B, C`
data = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('ABC'))

# Suppose to load dataframe with columns `X, Y, Z`
data.test.read()

# Will show dataframe with columns `A, B, C`
print (data)

Is there way how that could be fixed? What would be the best way to approach this problem?

Upvotes: 1

Views: 160

Answers (1)

Stef
Stef

Reputation: 30589

Note sure if this really makes much sense in practice, but here is a solution to achieve what you want in your example: drop all existing columns inplace and assign the new columns:

import pandas
import numpy

@pandas.api.extensions.register_dataframe_accessor("test")
class TestAccessor:
    def __init__(self, obj: pandas.DataFrame) -> None:
        self.data = obj

    def read(self) -> None:
        # Creates dataframe with three columns `X, Y, Z`
        self.data.drop(columns=self.data.columns, inplace=True)
        new = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('XYZ'))
        self.data[new.columns] = new

# Creates dataframe with three columns `A, B, C`
data = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 3)), columns=list('ABC'))

# Suppose to load dataframe with columns `X, Y, Z`
data.test.read()

# Now shows dataframe with columns `X, Y, Z`
print (data)

Output:

     X   Y   Z
0   30  86  16
1   33  93  33
2   43  62  95
3   24  74   5
4   52  68  95
..  ..  ..  ..
95  89  54  90
96  35  78  20
97  68  11  17
98  29  68  44
99  33  73  11

[100 rows x 3 columns]

Upvotes: 1

Related Questions