Reputation: 65
I have this kind of data from excel
dminerals=pd.read_excel(datafile)
print(dminerals.head(5))
Then I replace the 'Tr' and NaN value using for loop with this script
for key, value in dminerals.iteritems():
dminerals[key] = dminerals[key].replace(to_replace='Tr', value=int(1))
dminerals[key] = dminerals[key].replace(to_replace=np.nan, value=int(0))
then print it again, it seems working and print the dataframe types.But it shows object data type.
print(dminerals.head(5))
print(dminerals['C'].dtypes)
I tried using this .astype to change one of the column ['C'] to integer but the result is value error
dminerals['C'].astype(int)
ValueError: invalid literal for int() with base 10: 'tr'
I thought I already change the 'Tr' in the dataframe into integer value. Is there anything that I miss in the process above? Please help, thank you in advance!
Upvotes: 1
Views: 101
Reputation: 18367
You are replacing Tr
with 1, however there is a tr
that's not being replaced (this is what you ValueError
is saying. Remember python is case sensitive. Also, using for loops is extremely inefficient you might want to try using the following lines of code:
dminerales = dminerales.replace({'Tr':1,'tr':1}).fillna(0)
I'm using fillna()
which is also better to fill the null values with the specified value 0
in this case, instead of using repalce.
Upvotes: 1