Reputation: 33
I'm a Python beginner. I have multiple XLSX input files and I would like to read the XLSX in Pandas dataframe, check that fields I'm intersted in have the right datatype and then convert the dataframes in CSV.
Actually I'm using the following script:
# import needed modules
import pandas as pd
import numpy as np
import os
# Select the input folder
GMR_folder = r'C:\Users\Me\Desktop\MyFolder'
# Read all the files within the folder
files = os.listdir(GMR_folder)
# Read xlsx files within the folder
files_xls = [f for f in files if f[-4:] == 'xlsx']
for file in files_xls:
last_path = file
member = (file[:-5])
file_path = GMR_folder + "\\" + last_path
# print(file_path)
# Read the excel using specified datatype for the specified column
dataExcel = pd.read_excel(file_path,
skiprows=range(0,3)
# define the datatype for each column we're intersted in
dtype={'Col1':np.str,
'Col2':np.str,
'Col3':np.str,
'Col4':np.str,
'Col5':np.str,
'GPS Latitude (DD format) *': np.float32,
'GPS Longitude (DD format) *': np.float32,
'GPS Latitude (Degrés décimaux) *':np.float32,
'GPS Longitude (Degrés décimaux) *':np.float32,
"Col10":np.int64,
"Col11":np.float64,
"Col12":np.float64,
"Col13":np.int64,
"Col14":np.float64,
"Col15":np.float64
})
# Insert the member ID in Col1
dataExcel["Col1"] = member
# Export the dataframe into a csv using the right encoding, useful to avoid strange char
dataExcel.to_csv(member+'.csv',
encoding="utf-8-sig")
print(member + ' csv created')
print( 'all csv created')
Actually the script works well for some of the XLSX, but for others i have the following error:
Unable to convert column Col13 to type <class 'numpy.int64'>
it also happen with other columns that needs to be converted to float32.
How could I fix this error? It would be great to have NA values in the rows which can't be converted to the right datatype. How could I do it?
Upvotes: 0
Views: 106
Reputation: 7806
Check out the documentation on Pandas new types vs Numpy's types. https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html You will need to specify them as "Int64" (with a capital I to represent Pandas new Nullable types)
Upvotes: 1