gcourse
gcourse

Reputation: 31

Removing some characters in a DataFrame

I have a problem on replacing this value "..." by NaN. Here is my code

    import pandas as pd
    import numpy as np
    energy = pd.read_excel('Energy Indicators.xls')
    del energy['Unnamed: 0']
    del energy['Unnamed: 1']
    energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
    energy.replace("...", np.NaN)  
    energy['Energy Supply']

They always appear :

Out[46]:
17      321
18      102
19     1959
20      ...
21        9
       ... 
241     344
242     400
243     480
244     NaN
245     NaN
Name: Energy Supply, Length: 229, dtype: objec

Anyone can help me solve this ?

Thank you

Upvotes: 0

Views: 44

Answers (2)

Big Bro
Big Bro

Reputation: 944

Your problem is that replace returns a DataFrame by defaulft (see doc). To solve this you can either

energy = energy.replace("...", np.NaN)

or

energy.replace("...", np.NaN, inplace=True)

You can even avoid this problem altogether by specifying that "..." should be interpreted as NaN at read time:

energy = pd.read_excel("Energy Indicators.xls", na_values="...")

See doc.

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

That's because the ... goes with leading/trailing spaces. And also, you should check if they are one three-dot character or .... If your supply is numerical as shown, you can do:

energy['Energy Supply'] = pd.to_numeric(energy['Energy Supply'], errors='coerce')

Upvotes: 0

Related Questions