ricardoper9
ricardoper9

Reputation: 3

Can't properly replace blank values using pandas

I'm a python beginner, so I'm practicing some data analysis using pandas in a dataframe with a list of restaurants with a Michelin star (restaurants_df).

When I show, for example, the first 5 rows I notice that in the "price" column (object type) of row 4 I have a blank value:

In [ ]: restaurants_df.head()
Out[ ]:
    name            year    latitude    longitude     city        region    zipCode   cuisine           price
0   Kilian Stuba    2019    47.348580   10.17114   Kleinwalsertal Austria    87568  Creative              $
1   Pfefferschiff   2019    47.837870   13.07917    Hallwang      Austria    5300   Classic cuisine       $
2   Esszimmer       2019    47.806850   13.03409    Salzburg      Austria    5020   Creative              $
3   Carpe Diem      2019    47.800010   13.04006    Salzburg      Austria    5020   Market cuisine        $
4   Edvard          2019    48.216503   16.36852    Wien          Austria    1010   Modern cuisine  

Then I check how many NaN values are in each column. In the case of the price column there are 151 values:

In [ ]: restaurants_df.isnull().sum()
Out[ ]: name           0
        year           0
        latitude       0
        longitude      0
        city           2
        region         0
        zipCode      149
        cuisine        0
        price        151
        dtype: int64

After, I replace those values with the string "No Price", and confirm that all values have been replaced.

In [ ]: restaurants_df["price"].fillna("No Price", inplace = True)
        restaurants_df.isnull().sum()
Out[ ]: name           0
        year           0
        latitude       0
        longitude      0
        city           0
        region         0
        zipCode        0
        cuisine        0
        price          0
        dtype: int64

However, when I show the first 5 rows, the problem persists.

In [ ]: restaurants_df.head()
Out[ ]:
    name            year    latitude    longitude     city        region    zipCode   cuisine           price
0   Kilian Stuba    2019    47.348580   10.17114   Kleinwalsertal Austria    87568  Creative              $
1   Pfefferschiff   2019    47.837870   13.07917    Hallwang      Austria    5300   Classic cuisine       $
2   Esszimmer       2019    47.806850   13.03409    Salzburg      Austria    5020   Creative              $
3   Carpe Diem      2019    47.800010   13.04006    Salzburg      Austria    5020   Market cuisine        $
4   Edvard          2019    48.216503   16.36852    Wien          Austria    1010   Modern cuisine  

Any idea why this is happening and how I can solve it? Thanks in advance!

Upvotes: 0

Views: 171

Answers (3)

jayveesea
jayveesea

Reputation: 3199

I don't think pandas will recognize areas with '' as null. for instance:

df2 = pd.DataFrame(np.array([[1, 2, ''], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])

then:

df2.isnull()

    a   b   c
0   False   False   False
1   False   False   False
2   False   False   False

see here, and try:

pandas.options.mode.use_inf_as_na = True

EDIT:

you could also try replaying with:

df2.replace({'': 'No Price'}, inplace=True)

EDIT2: I believe @AKareem has the solution, but to expand you can use this to escape the latex

restaurants_df.replace({'price': {
                       '$': '\$', 
                       '$$': '\$$', 
                       '$$$': '\$$$', 
                       '$$$$': '\$$$$', 
                       '$$$$$': '\$$$$$'}}
           , inplace=True)

Upvotes: 0

A Kareem
A Kareem

Reputation: 608

Viewing the dataset over at kaggle shows that the first four restaurants are 5 '$' while the fifth is 4 '$'. Thus, I'm guessing that jupyter notebook is just not displaying all the '$' visually, however the data internally is correct.

To double check if I'm correct try running

df.price

and see what you get. I think this might have something to do with jupyter's HTML handler when it tries to display four dollar signs. You can look at this issue that is similar to yours

If you're bothered by this, simplay replace the '$' symbols with a number using something like

df.replace({'price': {'$': 1, '$$': 2, '$$$': 3, '$$$$': 4, '$$$$$': 5}})

Upvotes: 1

Moosa Saadat
Moosa Saadat

Reputation: 1167

What I understand is that you are dealing with both blank values and null values. These are handled differently. Check out this question to understand how to handle them.

Upvotes: 1

Related Questions