Iterative comparison with pandas

Question

I don't know to approach this issue. I have a data frame that looks like this

cuenta_bancaria nombre_empresa  perfil_cobranza  usuario_id  usuario_web 
5545              a              123              500199         5012
5551              a              123              500199         3321
5551              a               55              500199         5541
5551              b               55              500199         5246

What I need to do is to iterate between each row per usuario_id and check if there's a difference between each row, and create a new data set with the row changed and the usuario_web in charge of this change, to generate a data frame that looks like this:

usuario_id     cambio           usuario_web
 500199       cuenta_bancaria    3321
 500199       perfil_cobranza    5541
 500199       nombre_empresa     5246

Is there any way to do this? I'm working with pandas on python and this dataset could be a little big, let's say around 10000 rows, sorted by usuario_id.

Thanks for any advice.

cs95 · Accepted Answer

Compare adjacent rows with ne + shift, obtain a mask, and use this to

index into df to get the required rows
index into df.columns to get the required columns which change

c = df.columns.intersection(
        ['nombre_empresa', 'perfil_cobranza', 'cuenta_bancaria']
)

i = df[c].ne(df[c].shift())
j = i.sum(1).eq(1)

df = df.loc[j, ['usuario_id', 'usuario_web']]
df.insert(1, 'cambio', c[i[j].values.argmax(1)])

df

   usuario_id           cambio  usuario_web
1      500199  cuenta_bancaria         3321
2      500199  perfil_cobranza         5541
3      500199   nombre_empresa         5246

Iterative comparison with pandas

Answers (2)

Related Questions