Reputation: 191
I'm trying to:
Import a CSV of UPC codes into a dataframe. If the UPC code is 11 characters , append '0' to it. Ex: 19962123818 --> 019962123818
This is the code:
#check UPC code length. If 11 characters, adds '0' before. If < 11 or > 13, throws Error
for index, row in clean_data.iterrows():
if len(row['UPC']) == 11:
row['UPC'] = ('0' + row['UPC'])
#clean_data.set_value(row, 'UPC',('0' + (row['UPC']))
print ("Edited UPC:", row['UPC'], type(row['UPC']))
if len(row['UPC']) < 11 or len(row['UPC']) > 13:
print ('Error, UPC length < 11 or > 13:')
print ("Error in UPC:", row['UPC'])
quit()
However, when I print the data, the original value is not edited:
Does anyone know what is causing this issue?
I tried the set_value method as mentioned in other posts, but it didn't work.
Thanks!
Thanks for the vectorized approach, much cleaner! However, I get the following error, and the value is still not updating:
Upvotes: 1
Views: 108
Reputation: 10203
Can I suggest a different method?
#identify the strings shorter than 11 characters
fix_indx = clean_data.UPC.astype(str).str.len()<11
#append these strings with a '0'
clean_data.loc[fix_indx] = '0'+clean_data[fix_indx].astype(str)
To fix the others, you can similarly do:
bad_length_indx = clean_data.UPC.astype(str).str.len()>13
clean_data.loc[bad_length] = np.nan
Upvotes: 4
Reputation: 191
I finally fixed it. Thanks again for the vectorized idea. If anyone has this issue in the future, here's the code I used. Also, see this post for more info.
UPC_11_char = clean_data.UPC.astype(str).str.len() == 11
clean_data.ix[UPC_11_char, 'UPC'] = '0' + clean_data[UPC_11_char]['UPC'].astype(str)
print clean_data[UPC_11_char]['UPC']
Upvotes: 0
Reputation: 32095
According to iterrows
documentation:
- You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
row['UPC'] = ('0' + row['UPC'])
silently modifies a copy of the row, and clean_data
is kept unmodified.
Do adopt a vectorized approach of your algorithm like @Gene is suggesting.
Upvotes: 1