Yehuda
Yehuda

Reputation: 1893

Failing np.nan_to_num function

I'm examining the Accidental Drug Related Deaths dataset. The following is a list of all drugs:

20  Heroin               2529 non-null   object 
 21  Cocaine              1521 non-null   object 
 22  Fentanyl             2232 non-null   object 
 23  FentanylAnalogue     389 non-null    object 
 24  Oxycodone            607 non-null    object 
 25  Oxymorphone          108 non-null    object 
 26  Ethanol              1247 non-null   object 
 27  Hydrocodone          118 non-null    object 
 28  Benzodiazepine       1343 non-null   object 
 29  Methadone            474 non-null    object 
 30  Amphet               159 non-null    object 
 31  Tramad               130 non-null    object 
 32  Morphine_NotHeroin   42 non-null     object 
 33  Hydromorphone        25 non-null     object 
 34  Other                435 non-null    object 
 35  OpiateNOS            88 non-null     object 
 36  AnyOpioid            2466 non-null   object 

The dataset is sparse, with Y in place for each drug cause-of-death. For example, the following is deaths['Heroin'].head():

0       NaN
1       NaN
2         Y
3         Y
4       NaN

I'm trying to convert this to

0.        0
1         0
2         1
3         1
4         0

To convert the Y to 1, I've used deaths = deaths.replace(to_replace={'Y':1}). I'm now attempting to change the NaN to 0. I'm trying to use np.nan_to_num(), but my code doesn't seem to do anything.

I'm using the following:

deaths.loc[:,'Heroin':'AnyOpioid'] = np.nan_to_num(deaths.loc[:,'Heroin':'AnyOpioid'])

This outputs no change to the original dataset, with deaths['Heroin'].head() appearing as

0       NaN
1       NaN
2         Y
3         Y
4       NaN

(after the prior deaths.replace() function).

What is the mechanic that is causing this to happen? I'm assuming it's related to the .loc, but I'm not sure what to look at first or how to correct. Removing the .loc gives me a TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [Heroin] of <class 'str'>.

Upvotes: 1

Views: 283

Answers (1)

Ben.T
Ben.T

Reputation: 29635

You can use notna, giving Boolean False if the value is nan and True if the value is anything else (like Y here). To get 0 and 1, you can use astype:

deaths['Heroin'] = deaths['Heroin'].notna().astype(int)

Upvotes: 1

Related Questions