Ori N
Ori N

Reputation: 748

pandas CSV to Parquet data type is not set correctly when column has no values

I'm using pandas data frame read_csv function, and from time to time columns have no values. In this case the data type sent using the dtype parameter is ignored.

import pandas as pd
df = pd.read_csv("example.csv", dtype={"col1": "str", "col2": "float", "col3": "str"})
df.to_parquet("example.parquet")

This is the CSV file I used:

col1,col2,col3
A,1,
B,2,
C,3,

I expect col3 to be of type in the parquet file, instead it is INT32

Upvotes: 0

Views: 2144

Answers (1)

Karn Kumar
Karn Kumar

Reputation: 8816

Try the below code to avoid error in case the columns has no values.

import pandas as pd
df = pd.read_csv("example.csv", dtype={"col1": "str", "col2": "float", "col3": "str"}).fillna('')
df.to_parquet("example.parquet")

Upvotes: 1

Related Questions