Reputation: 136
I don't know to approach this issue. I have a data frame that looks like this
cuenta_bancaria nombre_empresa perfil_cobranza usuario_id usuario_web
5545 a 123 500199 5012
5551 a 123 500199 3321
5551 a 55 500199 5541
5551 b 55 500199 5246
What I need to do is to iterate between each row per usuario_id and check if there's a difference between each row, and create a new data set with the row changed and the usuario_web in charge of this change, to generate a data frame that looks like this:
usuario_id cambio usuario_web
500199 cuenta_bancaria 3321
500199 perfil_cobranza 5541
500199 nombre_empresa 5246
Is there any way to do this? I'm working with pandas on python and this dataset could be a little big, let's say around 10000 rows, sorted by usuario_id.
Thanks for any advice.
Upvotes: 0
Views: 163
Reputation: 607
There are a couple ways to iterate over a dataframe:
for index, row in df.iterrows():
#blah blah blah
but since you're wanting to reference the prior row, I think the easiest will be to iterate by position:
df2 = pd.DataFrame()
for i in range(1, np.shape(df)[0]):
current = df.iloc[i]
last = df.iloc[i-1]
newrow = {'usario_id' = current['usario_id'], 'usario_web'= current['usario_web']}
if current['cuenta_bancaria'] != last['cuenta_bancaria']:
newrow['cambio'] = 'cuenta_bancaria'
df2 = df2.append(newrow, ignore_index = False)
elif current['nombre_empresa'] != last['nombre_empresa']:
newrow['cambio'] = 'nombre_empresa'
df2 = df2.append(newrow, ignore_index = False)
elif current['perfil_cobranza'] != last['perfil_cobranza']:
newrow['cambio'] = 'perfil_cobranza'
df2 = df2.append(newrow, ignore_index = False)
Upvotes: 1
Reputation: 402363
Compare adjacent rows with ne
+ shift
, obtain a mask, and use this to
df
to get the required rowsdf.columns
to get the required columns which changec = df.columns.intersection(
['nombre_empresa', 'perfil_cobranza', 'cuenta_bancaria']
)
i = df[c].ne(df[c].shift())
j = i.sum(1).eq(1)
df = df.loc[j, ['usuario_id', 'usuario_web']]
df.insert(1, 'cambio', c[i[j].values.argmax(1)])
df
usuario_id cambio usuario_web
1 500199 cuenta_bancaria 3321
2 500199 perfil_cobranza 5541
3 500199 nombre_empresa 5246
Upvotes: 1