Reputation: 95
I have a dataframe with an ID column that has dtype Object (as contains INTs and STRs) so am trying to use np.where
to replace each of them in turn with the next highest number... However for some reason in the example below it's only replacing one of the 2 strings and I have no idea why?
df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
'IDint':[ 0, 0, 4806105017087, 414149]})
print (df)
unique_str_IDs = df['IDstr'][df['IDstr'].str.contains("ABC", na=False)].unique()
for i in range(len(unique_str_IDs)):
df['SKUintTEST']=np.where(df['IDstr'] == unique_str_IDs[i].strip(),
df['SKUint_y'].max()+i+1, df['SKUint_y'])
Has anyone got any ideas?
Upvotes: 1
Views: 77
Reputation: 29635
You can use map
with a dictionary created with in incremental for each unique id, then fillna
with the original value for the rows not mapped:
df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
'IDint':[ 0, 0, 4806105017087, 414149],
'SKUint_y': range(10,14)})
unique_str_IDs = df.loc[df['IDstr'].str.contains("ABC", na=False), 'IDstr'].unique()
df['SKUintTEST'] = df['IDstr'].map({idx:i for i, idx in enumerate(unique_str_IDs, df.SKUint_y.max()+1)})\
.fillna(df.SKUint_y)
print (df)
IDstr IDint SKUint_y SKUintTEST
0 480610_ABC_087 0 10 14.0
1 78910_ABC_087 0 11 15.0
2 4806105017087 4806105017087 12 12.0
3 414149 414149 13 13.0
Upvotes: 1