Reputation: 459
I need to check if the correct value in one column matches a specific ID. If the id exists but the value is incorrect, adjust the value. If id doesn't exist, add a row with the correct value.
I know how to do all this separately. But I would like to have something dynamic like function which will do it within the script. I do not use to work with "if-then", prefer np.where, however I think in this case, it must be row-wise interactive function that can combine all operations in one block.
Example. One df has the id which I want to check, but with wrong value (eg, id - 5 must have value - 89) Another df (df2) doesn't have id 5 that i need to append with correct value
I need to do it on one data base and act accordingly but here are different cases:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[1,2,3,4,5] ,
'value':[23,34,45,56,67]})
df2 = pd.DataFrame({'id':[1,2,3,4] ,
'value':[23,34,45,56]})
to check value:
value_to_check=[5]
check=df['id'].isin(value_to_check)
to correct value:
df['value']=np.where((df['id']==5)& (df['value']!=89), 89,df['value'] )
to append if not exist:
df_new = pd.DataFrame({'id':[5] ,
'value':[89]})
df2=df2.append(df_new, sort=False)
Hoe to put all together in one function?
Thank you
Upvotes: 1
Views: 1472
Reputation: 862911
Idea is use DataFrame.loc
for set values by 89
- if not exist is added new row, if exist is overwrite value. There is also added DataFrame.astype
for convert to original dtypes, if is appended new row:
df2 = pd.DataFrame({'id':[1,2,3,4] ,
'value':[23,34,45,56]})
df = pd.DataFrame({'id':[1,2,3,4,5] ,
'value':[23,34,45,56,67]})
def test(df, value_to_check):
df = df.set_index('id')
dtypes = df.dtypes
df.loc[value_to_check, ['value']] = 89
return df.astype(dtypes).reset_index()
df1 = test(df, 5)
print (df1)
id value
0 1 23
1 2 34
2 3 45
3 4 56
4 5 89
df1 = test(df2, 5)
print (df1)
id value
0 1 23
1 2 34
2 3 45
3 4 56
4 5 89
Upvotes: 1