Reputation: 31

Python appending Dataframe by external methods

In my project I have a Strategy which stores LOGS into a dataframe. A Strategy is a succession of Block instances. Each Block can writeLog().

What I want is that each block points to the Strategy.LOGS to append lines in it.

Here is my minimal code:

import pandas as pd

class Block:
    def __init__(self, logger):
        self.logger = logger
        
    def check(self):
        self.writeLog('test!')
        
    def writeLog(self, message):
        self.logger = self.logger.append({'Date':10, 'Message':message, 'uid':11}, ignore_index=True)


class Strategy:
    def __init__(self):
        self.LOGS = pd.DataFrame(columns=['Date', 'Message', 'uid'])
        
        self.blk1 = Block(logger=self.LOGS)
        self.blk2 = Block(logger=self.LOGS)
        
    def nxt(self):
        self.blk1.check()
        self.blk2.check()   
       
        
strat = Strategy()

for i in range(0,5):
    strat.nxt()
    
print(strat.LOGS)

print(strat.blk1.logger)
print(strat.blk2.logger)

These are the outputs:

>>Empty DataFrame
Columns: [Date, Message, uid]
Index: []

>>  Date Message uid
0   10   test!  11
1   10   test!  11
2   10   test!  11
3   10   test!  11
4   10   test!  11

>>  Date Message uid
0   10   test!  11
1   10   test!  11
2   10   test!  11
3   10   test!  11
4   10   test!  11

I don't understand why the attribute LOGS of Strategy is not appended. I thought I was pointing into strat.LOGS by writing logger=self.LOGS.

Thanks for your answers.

Upvotes: 3

Answers (2)

hpchavaz

Reputation: 1388

pandas.DataFrame.append documentation

pandas.DataFrame.append ... returns a new object

It is also the reason why strat.blk1.logger and strat.blk2.logger are different

Your code with some debugging modifications

import pandas as pd

class Block:
    uid = 0
    def __init__(self, logger):
        self.logger = logger
        
    def check(self):
        self.writeLog('test!')
        
    def writeLog(self, message):
        Block.uid += 1
        self.logger = self.logger.append({'Date':10, 'Message':message, 'uid':Block.uid}, ignore_index=True)


class Strategy:
    def __init__(self):
        self.LOGS = pd.DataFrame(columns=['Date', 'Message', 'uid'])
        
        self.blk1 = Block(self.LOGS)
        self.blk2 = Block(self.LOGS)
        
    def nxt(self):
        self.blk1.check()
        self.blk2.check()   

strat = Strategy()

print("BEFORE\n")
print('strat.LOGS\n', strat.LOGS, '\n')
print('id()', id(strat.LOGS), '\n')
print('strat.blk1.logger\n', 'id', id(strat.blk1.logger), '\n',strat.blk1.logger, '\n')
print('strat.blk2.logger\n', 'id', id(strat.blk2.logger), '\n', strat.blk2.logger, '\n')      

for i in range(0,5):
    strat.nxt()
    
print("\n\nAFTER\n")
print('strat.LOGS\n', strat.LOGS, '\n')
print('id(strat.LOGS)', id(strat.LOGS), '\n')
print('strat.blk1.logger\n', 'id', id(strat.blk1.logger), '\n',strat.blk1.logger, '\n')
print('strat.blk2.logger\n', 'id', id(strat.blk2.logger), '\n', strat.blk2.logger, '\n')

gives

BEFORE

strat.LOGS
 Empty DataFrame
Columns: [Date, Message, uid]
Index: [] 

id() 140039061035088 

strat.blk1.logger
 id 140039061035088 
 Empty DataFrame
Columns: [Date, Message, uid]
Index: [] 

strat.blk2.logger
 id 140039061035088 
 Empty DataFrame
Columns: [Date, Message, uid]
Index: [] 



AFTER

strat.LOGS
 Empty DataFrame
Columns: [Date, Message, uid]
Index: [] 

id(strat.LOGS) 140039061035088 

strat.blk1.logger
 id 140039063529168 
   Date Message uid
0   10   test!   1
1   10   test!   3
2   10   test!   5
3   10   test!   7
4   10   test!   9 

strat.blk2.logger
 id 140039062432592 
   Date Message uid
0   10   test!   2
1   10   test!   4
2   10   test!   6
3   10   test!   8
4   10   test!  10

Upvotes: 3

lane

Reputation: 886

That is because DataFrame.append returns a new object every time. See pandas documentation on append here

In general pandas prefers immutable (stateless) objects, instead of modifying objects in-place. Because of this, I would recommend changing the code a little bit so that each Block has its own logger. This is good protection in case of multithreading as well. Then you can create a get_log method in Strategy that will let you grab the current instance of both blk.logger variables. Any sorting by date can happen here as well if needed.

import pandas as pd

class Block:
    def __init__(self):
        self.logger = pd.DataFrame(columns=['Date', 'Message', 'uid'])
        
    def check(self):
        self.writeLog('test!')
        
    def writeLog(self, message):
        self.logger = self.logger.append({'Date':10, 'Message':message, 'uid':11}, ignore_index=True)


class Strategy:
    def __init__(self):
       
        self.blk1 = Block()
        self.blk2 = Block()
        
    def nxt(self):
        self.blk1.check()
        self.blk2.check()   
        
    def get_log(self):
        return pd.concat((self.blk1.logger, self.blk2.logger), ignore_index=True)
       
        
strat = Strategy()

for i in range(0,5):
    strat.nxt()
    
print(strat.get_log())

print(strat.blk1.logger)
print(strat.blk2.logger)

Output

  Date Message uid
0   10   test!  11
1   10   test!  11
2   10   test!  11
3   10   test!  11
4   10   test!  11
5   10   test!  11
6   10   test!  11
7   10   test!  11
8   10   test!  11
9   10   test!  11
  Date Message uid
0   10   test!  11
1   10   test!  11
2   10   test!  11
3   10   test!  11
4   10   test!  11
  Date Message uid
0   10   test!  11
1   10   test!  11
2   10   test!  11
3   10   test!  11
4   10   test!  11

Upvotes: 3

Python appending Dataframe by external methods

Answers (2)

Related Questions