Reputation: 4625
I commented a closed issue in Pandas github repo:
Treating an empty value in Excel as nan
has another side effect: integer will be converted to float. Subsequent operation on that columns will have other effects again.
Also read_excel()
does not honor empty value handling offered by function in converters:
I have an Excel file temp.xlsx
with the following data:
There are white spaces surrounding values in Key3
column.
Key1,Key2,Key3,Key4
0,11, Apple ,1.12
1,12,,1.02
2,13, Orange,
3, ,Banana ,0.01
This is the code:
import numpy as np
import pandas as pd
def handle_string(value):
return value.replace(' ', '')
def handle_integer(value):
if value == '':
return 0
else:
int(value)
def handle_float(value):
if value == '':
return 0.0
else:
float(value)
df = pd.read_excel(
'temp.xlsx',
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")
print('')
df = pd.read_excel(
'temp.xlsx',
converters={\
'Key1' : handle_integer,
'Key2' : handle_integer,
'Key3' : handle_string,
'Key4' : handle_float,
}
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")
The output:
Key1 Key2 Key3 Key4
0 0 11.0 Apple 1.12
1 1 12.0 NaN 1.02
2 2 13.0 Orange NaN
3 3 NaN Banana 0.01
type(df.loc[3,'Key2']) = <class 'numpy.float64'>
type(df.loc[1,'Key3']) = <class 'float'>
type(df.loc[2,'Key4']) = <class 'numpy.float64'>
Key1 Key2 Key3 Key4
0 None NaN Apple NaN
1 None NaN NaN NaN
2 None NaN Orange 0.0
3 None 0.0 Banana NaN
type(df.loc[3,'Key2']) = <class 'numpy.float64'>
type(df.loc[1,'Key3']) = <class 'float'>
type(df.loc[2,'Key4']) = <class 'numpy.float64'>
dtype
parameters have lower priority than converters
.
Upvotes: 1
Views: 1384
Reputation: 4265
I could be wrong, but it looks to me like the problem is related to your return values for these functions. In two places, you return None
without apparently intending to. See below:
def handle_string(value):
return value.replace(' ', '')
def handle_integer(value):
if value == '':
return 0
else:
int(value) # Returns none!!!
def handle_float(value):
if value == '':
return 0.0
else:
float(value) # Returns none!!!
Upvotes: 1