Pandas fillna('value') followed by df.replace('value',np.nan) not working

Question

For some reason df.replace() is not working for me after I pivot my data. I am going from long form data to wide form. I want to fill nan values with a dummy value, pivot, then turn the dummy values back into nans using replace, but replace is not working. On further investigation it seems that the 'yy' value is not being recognised as the same as the fillna value so the function cant find anything to replace. e.g.

"Checking again for 'yy' values presence: False"

I don't know what's going on. Note this also still happens for me using inplace = True, regex=True and if I put the find and replace items in a dictionary e.g. {'yy':np.nan}. My real data is being read from an excel sheet using read_excel.

import pandas as pd
import numpy as np

# Load example data
data_as_dict ={'SiteID': {0: 'Somewhere Creek D/S',  1: 'Somewhere Creek D/S',  2: 'Somewhere Creek D/S',  3: 'Somewhere Creek D/S',  4: 'Somewhere Creek D/S',  5: 'Somewhere Creek D/S',  6: 'Somewhere Creek D/S',  7: 'Somewhere Creek D/S',  8: 'Somewhere Creek D/S'}, 'ParameterID': {0: 'EW_APHA1030E.IONBAL',  1: 'EW_APHA1030E.IONBAL',  2: 'EW_APHA1030E.SUM_OF_IONS',  3: 'EW_APHA1030E.SUM_OF_IONS',  4: 'EW_APHA1030E.TFSS',  5: 'EW_APHA2120C_UV.COLOUR_TRUE',  6: 'EW_APHA2130.TURB_BEFORE',  7: 'EW_APHA2320.ALK_BICAR',  8: 'EW_APHA2320.ALK_BICAR'}, 'SampleDate': {0: '2017-04-03 09:30:00',  1: '2019-04-17 13:30:00',  2: '2017-04-03 09:30:00',  3: '2017-04-03 09:30:01',  4: '2017-04-03 09:30:00',  5: '2017-04-03 09:30:00',  6: '2017-04-03 09:30:00',  7: '2017-04-03 09:30:00',  8: '2019-04-17 13:30:00'}, 'Reading': {0: 15.0,  1: -0.7,  2: 278.0,  3: 975.0,  4: 278.0,  5: 35.0,  6: 20.0,  7: 98.0,  8: 230.0}, 'SampledBy': {0: 'dafdsfd',  1: np.nan,  2: 'dafdsfd',  3: np.nan,  4: 'dafdsfd',  5: 'dafdsfd',  6: 'dafdsfd',  7: 'dafdsfd',  8: np.nan}, 'LabID': {0: 'dagfdfda',  1: np.nan,  2: 'dagfdfda',  3: np.nan,  4: 'dagfdfda',  5: 'dagfdfda',  6: 'dagfdfda',  7: 'dagfdfda',  8: np.nan}, 'Overflow': {0: np.nan,  1: np.nan,  2: np.nan,  3: np.nan,  4: np.nan,  5: np.nan,  6: np.nan,  7: np.nan,  8: np.nan}, 'Symbol': {0: '%',  1: '%',  2: 'mg/L',  3: 'mg/L',  4: 'mg/L',  5: 'Hazen',  6: 'NTU',  7: 'mg/L',  8: 'mg/L'}, 'Description': {0: 'Anion-Cation Balance',  1: 'Anion-Cation Balance',  2: 'Sum of Ions',  3: 'Sum of Ions',  4: 'TFSS',  5: 'Colour (True)',  6: 'Turbidity',  7: 'Bicarbonate Alkalinity as CaCO3',  8: 'Bicarbonate Alkalinity as CaCO3'}, 'Parameter': {0: 'Anion-Cation Balance %',  1: 'Anion-Cation Balance %',  2: 'Sum of Ions mg/L',  3: 'Sum of Ions mg/L',  4: 'TFSS mg/L',  5: 'Colour (True) Hazen',  6: 'Turbidity NTU',  7: 'Bicarbonate Alkalinity as CaCO3 mg/L',  8: 'Bicarbonate Alkalinity as CaCO3 mg/L'}}

df=pd.DataFrame.from_dict(data_as_dict)

# Replace NaNs with "yy" 
df = df.fillna("yy")


# Pivot the data
dsf = df.pivot(index=['SiteID', 'ParameterID', 'SampleDate', 'SampledBy', 'LabID', 'Overflow', 'Symbol',  'Description' ], columns='Parameter', values='Reading')

# replace values
dsf = dsf.replace('yy',np.nan)

# Check for 'yy' values
print("Checking for 'yy' values presence:")
print((dsf == 'yy').any().any())

CrisPlusPlus · Accepted Answer

The function replace searches for matches in the values of a dataframe. In principle, the columns of df that contain NaNs are given by all the nonzero values of df.isna().sum():

SiteID             0
ParameterID        0
SampleDate         0
Reading            0
SampledBy          3
LabID              3
Overflow           9
Easting            0
Northing           0
RL                 9
Type               9
SiteDescription    9
Status             0
Unit               0
Symbol             0
Format             9
Description        0
Overflow2          9
Overflow3          9
Overflow8          9
Lat                0
Lon                0
Zone               9
FlwTo              9
Marker             9
Parameter          0
dtype: int64

Then, pivoting your dataframe loses all the "yy" values, as you can confirm by running

for col in dsf:
    print(dsf[col].values)

right after the first assignment of dsf. This happens because when pivoting, you are making the columns of df that contain "yy"s part of the MultiIndex of the pivoted table, hence not seen by replace, or by any bool operation acting on the values, such as dsf == 'something'

The solution is to rethink how you pivot your original dataframe and to make sure you understand how to use replace.

Pandas fillna('value') followed by df.replace('value',np.nan) not working

Answers (1)

Related Questions

Pandas fillna(&#39;value&#39;) followed by df.replace(&#39;value&#39;,np.nan) not working

Answers (1)

Related Questions

Pandas fillna('value') followed by df.replace('value',np.nan) not working