Reputation: 578
For some reason df.replace() is not working for me after I pivot my data. I am going from long form data to wide form. I want to fill nan values with a dummy value, pivot, then turn the dummy values back into nans using replace, but replace is not working. On further investigation it seems that the 'yy' value is not being recognised as the same as the fillna value so the function cant find anything to replace. e.g.
"Checking again for 'yy' values presence: False"
I don't know what's going on. Note this also still happens for me using inplace = True, regex=True and if I put the find and replace items in a dictionary e.g. {'yy':np.nan}. My real data is being read from an excel sheet using read_excel.
import pandas as pd
import numpy as np
# Load example data
data_as_dict ={'SiteID': {0: 'Somewhere Creek D/S', 1: 'Somewhere Creek D/S', 2: 'Somewhere Creek D/S', 3: 'Somewhere Creek D/S', 4: 'Somewhere Creek D/S', 5: 'Somewhere Creek D/S', 6: 'Somewhere Creek D/S', 7: 'Somewhere Creek D/S', 8: 'Somewhere Creek D/S'}, 'ParameterID': {0: 'EW_APHA1030E.IONBAL', 1: 'EW_APHA1030E.IONBAL', 2: 'EW_APHA1030E.SUM_OF_IONS', 3: 'EW_APHA1030E.SUM_OF_IONS', 4: 'EW_APHA1030E.TFSS', 5: 'EW_APHA2120C_UV.COLOUR_TRUE', 6: 'EW_APHA2130.TURB_BEFORE', 7: 'EW_APHA2320.ALK_BICAR', 8: 'EW_APHA2320.ALK_BICAR'}, 'SampleDate': {0: '2017-04-03 09:30:00', 1: '2019-04-17 13:30:00', 2: '2017-04-03 09:30:00', 3: '2017-04-03 09:30:01', 4: '2017-04-03 09:30:00', 5: '2017-04-03 09:30:00', 6: '2017-04-03 09:30:00', 7: '2017-04-03 09:30:00', 8: '2019-04-17 13:30:00'}, 'Reading': {0: 15.0, 1: -0.7, 2: 278.0, 3: 975.0, 4: 278.0, 5: 35.0, 6: 20.0, 7: 98.0, 8: 230.0}, 'SampledBy': {0: 'dafdsfd', 1: np.nan, 2: 'dafdsfd', 3: np.nan, 4: 'dafdsfd', 5: 'dafdsfd', 6: 'dafdsfd', 7: 'dafdsfd', 8: np.nan}, 'LabID': {0: 'dagfdfda', 1: np.nan, 2: 'dagfdfda', 3: np.nan, 4: 'dagfdfda', 5: 'dagfdfda', 6: 'dagfdfda', 7: 'dagfdfda', 8: np.nan}, 'Overflow': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan}, 'Symbol': {0: '%', 1: '%', 2: 'mg/L', 3: 'mg/L', 4: 'mg/L', 5: 'Hazen', 6: 'NTU', 7: 'mg/L', 8: 'mg/L'}, 'Description': {0: 'Anion-Cation Balance', 1: 'Anion-Cation Balance', 2: 'Sum of Ions', 3: 'Sum of Ions', 4: 'TFSS', 5: 'Colour (True)', 6: 'Turbidity', 7: 'Bicarbonate Alkalinity as CaCO3', 8: 'Bicarbonate Alkalinity as CaCO3'}, 'Parameter': {0: 'Anion-Cation Balance %', 1: 'Anion-Cation Balance %', 2: 'Sum of Ions mg/L', 3: 'Sum of Ions mg/L', 4: 'TFSS mg/L', 5: 'Colour (True) Hazen', 6: 'Turbidity NTU', 7: 'Bicarbonate Alkalinity as CaCO3 mg/L', 8: 'Bicarbonate Alkalinity as CaCO3 mg/L'}}
df=pd.DataFrame.from_dict(data_as_dict)
# Replace NaNs with "yy"
df = df.fillna("yy")
# Pivot the data
dsf = df.pivot(index=['SiteID', 'ParameterID', 'SampleDate', 'SampledBy', 'LabID', 'Overflow', 'Symbol', 'Description' ], columns='Parameter', values='Reading')
# replace values
dsf = dsf.replace('yy',np.nan)
# Check for 'yy' values
print("Checking for 'yy' values presence:")
print((dsf == 'yy').any().any())
Upvotes: 0
Views: 111
Reputation: 2302
The function replace
searches for matches in the values
of a dataframe. In principle, the columns of df
that contain NaNs
are given by all the nonzero values of df.isna().sum()
:
SiteID 0
ParameterID 0
SampleDate 0
Reading 0
SampledBy 3
LabID 3
Overflow 9
Easting 0
Northing 0
RL 9
Type 9
SiteDescription 9
Status 0
Unit 0
Symbol 0
Format 9
Description 0
Overflow2 9
Overflow3 9
Overflow8 9
Lat 0
Lon 0
Zone 9
FlwTo 9
Marker 9
Parameter 0
dtype: int64
Then, pivot
ing your dataframe loses all the "yy"
values, as you can confirm by running
for col in dsf:
print(dsf[col].values)
right after the first assignment of dsf
. This happens because when pivot
ing, you are making the columns of df
that contain "yy"
s part of the MultiIndex
of the pivoted table, hence not seen by replace
, or by any bool operation acting on the values, such as dsf == 'something'
The solution is to rethink how you pivot your original dataframe and to make sure you understand how to use replace
.
Upvotes: 2