Replacing string with value calculated from the max of another column in a dataframe

Question

I have a dataframe with an ID column that has dtype Object (as contains INTs and STRs) so am trying to use np.where to replace each of them in turn with the next highest number... However for some reason in the example below it's only replacing one of the 2 strings and I have no idea why?

df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
                       'IDint':[ 0, 0, 4806105017087, 414149]})
print (df)

unique_str_IDs = df['IDstr'][df['IDstr'].str.contains("ABC", na=False)].unique()
for i in range(len(unique_str_IDs)):
    df['SKUintTEST']=np.where(df['IDstr'] == unique_str_IDs[i].strip(), 
            df['SKUint_y'].max()+i+1, df['SKUint_y'])

Has anyone got any ideas?

Ben.T · Accepted Answer

You can use map with a dictionary created with in incremental for each unique id, then fillna with the original value for the rows not mapped:

df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
                    'IDint':[ 0, 0, 4806105017087, 414149], 
                    'SKUint_y': range(10,14)})

unique_str_IDs = df.loc[df['IDstr'].str.contains("ABC", na=False), 'IDstr'].unique()

df['SKUintTEST'] = df['IDstr'].map({idx:i for i, idx in enumerate(unique_str_IDs, df.SKUint_y.max()+1)})\
                              .fillna(df.SKUint_y)

print (df)
            IDstr          IDint  SKUint_y  SKUintTEST
0  480610_ABC_087              0        10        14.0
1   78910_ABC_087              0        11        15.0
2   4806105017087  4806105017087        12        12.0
3          414149         414149        13        13.0

Replacing string with value calculated from the max of another column in a dataframe

Answers (1)

Related Questions