Handling empty value in Excel by pandas read_excel()

Question

I commented a closed issue in Pandas github repo:

Treating an empty value in Excel as nan has another side effect: integer will be converted to float. Subsequent operation on that columns will have other effects again.

Also read_excel() does not honor empty value handling offered by function in converters:

I have an Excel file temp.xlsx with the following data:

There are white spaces surrounding values in Key3 column.

Key1,Key2,Key3,Key4
0,11,  Apple  ,1.12
1,12,,1.02
2,13,  Orange,
3,  ,Banana  ,0.01

This is the code:

import numpy as np
import pandas as pd

def handle_string(value):
    return value.replace(' ', '')

def handle_integer(value):
    if value == '':
        return 0
    else:
        int(value)

def handle_float(value):
    if value == '':
        return 0.0
    else:
        float(value)

df = pd.read_excel(
        'temp.xlsx',
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")

print('')

df = pd.read_excel(
        'temp.xlsx',
        converters={\
            'Key1' : handle_integer,
            'Key2' : handle_integer,
            'Key3' : handle_string,
            'Key4' : handle_float,
        }
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")

The output:

   Key1  Key2        Key3  Key4
0     0  11.0     Apple    1.12
1     1  12.0         NaN  1.02
2     2  13.0    Orange     NaN
3     3   NaN    Banana    0.01
type(df.loc[3,'Key2']) = 
type(df.loc[1,'Key3']) = 
type(df.loc[2,'Key4']) = 

   Key1  Key2    Key3  Key4
0  None   NaN   Apple   NaN
1  None   NaN     NaN   NaN
2  None   NaN  Orange   0.0
3  None   0.0  Banana   NaN
type(df.loc[3,'Key2']) = 
type(df.loc[1,'Key3']) = 
type(df.loc[2,'Key4']) =

dtype parameters have lower priority than converters.

Handling empty value in Excel by pandas read_excel()

Answers (1)

Related Questions