muhammad tajul arifin
muhammad tajul arifin

Reputation: 65

replace a string in entire dataframe from excel with value

I have this kind of data from excel

dminerals=pd.read_excel(datafile)
print(dminerals.head(5))

enter image description here

Then I replace the 'Tr' and NaN value using for loop with this script

for key, value in dminerals.iteritems(): 
    dminerals[key] = dminerals[key].replace(to_replace='Tr', value=int(1))
    dminerals[key] = dminerals[key].replace(to_replace=np.nan, value=int(0))

then print it again, it seems working and print the dataframe types.But it shows object data type.

print(dminerals.head(5))
print(dminerals['C'].dtypes)

enter image description here

I tried using this .astype to change one of the column ['C'] to integer but the result is value error

dminerals['C'].astype(int)
ValueError: invalid literal for int() with base 10: 'tr'

I thought I already change the 'Tr' in the dataframe into integer value. Is there anything that I miss in the process above? Please help, thank you in advance!

Upvotes: 1

Views: 101

Answers (1)

Celius Stingher
Celius Stingher

Reputation: 18367

You are replacing Tr with 1, however there is a tr that's not being replaced (this is what you ValueError is saying. Remember python is case sensitive. Also, using for loops is extremely inefficient you might want to try using the following lines of code:

dminerales = dminerales.replace({'Tr':1,'tr':1}).fillna(0)

I'm using fillna() which is also better to fill the null values with the specified value 0 in this case, instead of using repalce.

Upvotes: 1

Related Questions