Reputation: 748
I'm using pandas data frame read_csv function, and from time to time columns have no values. In this case the data type sent using the dtype parameter is ignored.
import pandas as pd
df = pd.read_csv("example.csv", dtype={"col1": "str", "col2": "float", "col3": "str"})
df.to_parquet("example.parquet")
This is the CSV file I used:
col1,col2,col3
A,1,
B,2,
C,3,
I expect col3 to be of type in the parquet file, instead it is INT32
Upvotes: 0
Views: 2144
Reputation: 8816
Try the below code to avoid error in case the columns has no values.
import pandas as pd
df = pd.read_csv("example.csv", dtype={"col1": "str", "col2": "float", "col3": "str"}).fillna('')
df.to_parquet("example.parquet")
Upvotes: 1