Vero
Vero

Reputation: 459

Check if row with correct values in dataframe exists and append if not

I need to check if the correct value in one column matches a specific ID. If the id exists but the value is incorrect, adjust the value. If id doesn't exist, add a row with the correct value.

I know how to do all this separately. But I would like to have something dynamic like function which will do it within the script. I do not use to work with "if-then", prefer np.where, however I think in this case, it must be row-wise interactive function that can combine all operations in one block.

Example. One df has the id which I want to check, but with wrong value (eg, id - 5 must have value - 89) Another df (df2) doesn't have id 5 that i need to append with correct value

I need to do it on one data base and act accordingly but here are different cases:

import pandas as pd
import numpy as np

df = pd.DataFrame({'id':[1,2,3,4,5] ,                  
               'value':[23,34,45,56,67]})

df2 = pd.DataFrame({'id':[1,2,3,4] ,                  
               'value':[23,34,45,56]})

to check value:

value_to_check=[5]
check=df['id'].isin(value_to_check)

to correct value:

df['value']=np.where((df['id']==5)& (df['value']!=89), 89,df['value'] )

to append if not exist:

df_new = pd.DataFrame({'id':[5] ,                  
               'value':[89]})

df2=df2.append(df_new, sort=False)

Hoe to put all together in one function?

Thank you

Upvotes: 1

Views: 1472

Answers (1)

jezrael
jezrael

Reputation: 862911

Idea is use DataFrame.loc for set values by 89 - if not exist is added new row, if exist is overwrite value. There is also added DataFrame.astype for convert to original dtypes, if is appended new row:

df2 = pd.DataFrame({'id':[1,2,3,4] ,                  
               'value':[23,34,45,56]})

df = pd.DataFrame({'id':[1,2,3,4,5] ,                  
               'value':[23,34,45,56,67]})

def test(df, value_to_check):
    df = df.set_index('id')
    dtypes = df.dtypes
    df.loc[value_to_check, ['value']] = 89
    return df.astype(dtypes).reset_index()

df1 = test(df, 5)
print (df1)
   id  value
0   1     23
1   2     34
2   3     45
3   4     56
4   5     89

df1 = test(df2, 5)
print (df1)
   id  value
0   1     23
1   2     34
2   3     45
3   4     56
4   5     89

Upvotes: 1

Related Questions