Bluetail
Bluetail

Reputation: 1291

Problem with changing NaN values to 0 in a column of a pandas dataframe

I am trying to understand how this works..

I have this df.

   ticket_id                      address grafitti_status
0     284932  10041 roseberry, Detroit MI             NaN
1     285362  18520 evergreen, Detroit MI             NaN
2     285361  18520 evergreen, Detroit MI             NaN
3     285338     1835 central, Detroit MI             NaN
4     285346     1700 central, Detroit MI             NaN
5     285345     1700 central, Detroit MI             NaN


where

In: df.grafitti_status.unique()
Out: array([nan, 'GRAFFITI TICKET'], dtype=object)

So I am trying to change NaN to 0 and 'GRAFFITI TICKET' to 1.

I used

df.loc[df['grafitti_status'] == 'GRAFFITI TICKET', 'grafitti_status'] = 1

which works fine, but the same for '0'

df.loc[df['grafitti_status'] == np.nan, 'grafitti_status'] = 0

Out: array([nan, 1], dtype=object)

does not work because NaN values still remain..

and

df['grafitti_status'] = df['grafitti_status'].replace({np.nan:0,'GRAFFITI TICKET':1},inplace=True)

does not work either, replacing everything with None.

    ticket_id   address grafitti_status
0   284932  10041 roseberry, Detroit MI None
1   285362  18520 evergreen, Detroit MI None
2   285361  18520 evergreen, Detroit MI None
3   285338  1835 central, Detroit MI    None
4   285346  1700 central, Detroit MI    None
5   285345  1700 central, Detroit MI    None
6   285347  1700 central, Detroit MI    None

Can anybody provide me any insight why it works this way?

I have finally found that I can achieve the desired result with

df.loc[df['grafitti_status'] == 'GRAFFITI TICKET', 'grafitti_status'] = 1
df['grafitti_status'] = df['grafitti_status'].fillna(0)

Out: array([0, 1], dtype=int64)

which leads to the following warning message.

C:\Users\Maria\Anaconda3\lib\site-packages\pandas\core\indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
C:\Users\Maria\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

So I am still not sure what would be the correct way to do it?

Upvotes: 0

Views: 307

Answers (1)

BENY
BENY

Reputation: 323226

Since

np.nan==np.nan will return False

We have function isna

df.loc[df['grafitti_status'].isna(), 'grafitti_status'] = 0

Upvotes: 2

Related Questions