Carl
Carl

Reputation: 33

Pandas read excel using right datatype and convert to CSV

I'm a Python beginner. I have multiple XLSX input files and I would like to read the XLSX in Pandas dataframe, check that fields I'm intersted in have the right datatype and then convert the dataframes in CSV.

Actually I'm using the following script:

# import needed modules
import pandas as pd
import numpy as np
import os

# Select the input folder 
GMR_folder = r'C:\Users\Me\Desktop\MyFolder'

# Read all the files within the folder 
files = os.listdir(GMR_folder)

# Read xlsx files within the folder 
files_xls = [f for f in files if f[-4:] == 'xlsx']

for file in files_xls:
    last_path = file
    member = (file[:-5])
    file_path = GMR_folder + "\\" + last_path
    # print(file_path)
    
    # Read the excel using specified datatype for the specified column 
    dataExcel = pd.read_excel(file_path, 
                         skiprows=range(0,3)

                        # define the datatype for each column we're intersted in 
                         dtype={'Col1':np.str,
                                'Col2':np.str,
                                'Col3':np.str,
                                'Col4':np.str,
                                'Col5':np.str,
                                'GPS Latitude (DD format) *': np.float32, 
                                'GPS Longitude (DD format) *': np.float32,
                                'GPS Latitude (Degrés décimaux) *':np.float32,
                                'GPS Longitude (Degrés décimaux) *':np.float32,
                                "Col10":np.int64,
                                "Col11":np.float64, 
                                "Col12":np.float64, 
                                "Col13":np.int64,
                                "Col14":np.float64,
                                "Col15":np.float64
                               })
                         
    
    # Insert the member ID in Col1
    dataExcel["Col1"] = member
    
    # Export the dataframe into a csv using the right encoding, useful to avoid strange char
    dataExcel.to_csv(member+'.csv',
                     encoding="utf-8-sig")

    print(member + ' csv created')

print( 'all csv created')

Actually the script works well for some of the XLSX, but for others i have the following error:

Unable to convert column Col13 to type <class 'numpy.int64'>

it also happen with other columns that needs to be converted to float32.

How could I fix this error? It would be great to have NA values in the rows which can't be converted to the right datatype. How could I do it?

Upvotes: 0

Views: 106

Answers (1)

Back2Basics
Back2Basics

Reputation: 7806

Check out the documentation on Pandas new types vs Numpy's types. https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html You will need to specify them as "Int64" (with a capital I to represent Pandas new Nullable types)

Upvotes: 1

Related Questions