RGRGRG
RGRGRG

Reputation: 1079

Create log with change in values in pandas dataframe

I have been working with a dataframe having the following format (the actual table has much more rows(id's) and columns(value_3, value_4 etc..)):

enter image description here

where for each id, the status column has the value 'new' if this is the first entry for that id, and the value 'modified' if any of the value_1, value_2 columns have changed compared to their previous value. I would like to create a log of any changes made in the table, in particular I would like the resulted format for the given data above to be something like this:

enter image description here

Ideally, I would like to avoid using loops, so could you please suggest any more efficient pythonic way to achieve the format above?

I have seen the answers posted for the question here: Determining when a column value changes in pandas dataframe

which partly do the job I want (using shift or diff) for identifying cases where there was a change, and I was wondering if this is the best way to build on for my case, or if there is a more efficient way to do that and speed up the process. Ideally, I would like something that can work for both numeric and non-numeric values in value_1, value_2, etc columns..

Code for creating the sample data of the first pic:

    import pandas as pd
    data = [[1,2,5,'new'], [1,1,5,'modified'], [1,0,5,'modified'], 
    [2,5,2,'new'], [2,5,3,'modified'], [2,5,4,'modified']  ]
    df = pd.DataFrame(data, columns = ['id', 'value_1', 'value_2', 
   'status'])
    df

Many thanks in advance for any suggestion/help!

Upvotes: 0

Views: 306

Answers (1)

BENY
BENY

Reputation: 323226

We do need melt first then groupby after drop_duplicates

s = df.melt(['id','status']).drop_duplicates(['id','variable','value'])
s['new'] = s.groupby(['id','variable'])['value'].shift()

s #s.sort_values('id')
    id    status variable  value  new
0    1       new  value_1      2  NaN
1    1  modified  value_1      1  2.0
2    1  modified  value_1      0  1.0
3    2       new  value_1      5  NaN
6    1       new  value_2      5  NaN
9    2       new  value_2      2  NaN
10   2  modified  value_2      3  2.0
11   2  modified  value_2      4  3.0

Upvotes: 1

Related Questions