Create log with change in values in pandas dataframe

Question

I have been working with a dataframe having the following format (the actual table has much more rows(id's) and columns(value_3, value_4 etc..)):

where for each id, the status column has the value 'new' if this is the first entry for that id, and the value 'modified' if any of the value_1, value_2 columns have changed compared to their previous value. I would like to create a log of any changes made in the table, in particular I would like the resulted format for the given data above to be something like this:

Ideally, I would like to avoid using loops, so could you please suggest any more efficient pythonic way to achieve the format above?

I have seen the answers posted for the question here: Determining when a column value changes in pandas dataframe

which partly do the job I want (using shift or diff) for identifying cases where there was a change, and I was wondering if this is the best way to build on for my case, or if there is a more efficient way to do that and speed up the process. Ideally, I would like something that can work for both numeric and non-numeric values in value_1, value_2, etc columns..

Code for creating the sample data of the first pic:

    import pandas as pd
    data = [[1,2,5,'new'], [1,1,5,'modified'], [1,0,5,'modified'], 
    [2,5,2,'new'], [2,5,3,'modified'], [2,5,4,'modified']  ]
    df = pd.DataFrame(data, columns = ['id', 'value_1', 'value_2', 
   'status'])
    df

Many thanks in advance for any suggestion/help!

BENY · Accepted Answer

We do need melt first then groupby after drop_duplicates

s = df.melt(['id','status']).drop_duplicates(['id','variable','value'])
s['new'] = s.groupby(['id','variable'])['value'].shift()

s #s.sort_values('id')
    id    status variable  value  new
0    1       new  value_1      2  NaN
1    1  modified  value_1      1  2.0
2    1  modified  value_1      0  1.0
3    2       new  value_1      5  NaN
6    1       new  value_2      5  NaN
9    2       new  value_2      2  NaN
10   2  modified  value_2      3  2.0
11   2  modified  value_2      4  3.0

Create log with change in values in pandas dataframe

Answers (1)

Related Questions