Reputation: 32286
I am getting an error while trying to save the dataframe as a file.
from fastparquet import write
write('profile_dtl.parq', df)
The error is related to "date" and the error message looks like this...
ValueError: Can't infer object conversion type: 0 1990-01-01
1 1954-01-01
2 1981-11-15
3 1993-01-21
4 1948-01-01
5 1977-01-01
6 1968-04-28
7 1969-01-01
8 1989-01-01
9 1985-01-01
Name: dob, dtype: object
I have checked that the column is "object" just like any other column that can be serialized without any problem. If I remove the "dob" column from the dataframe, then this line will work. This will also work if there is date+time.
Only dates are not accepted by fast-parquet?
Upvotes: 1
Views: 5234
Reputation: 388
Try changing dob
to datetime64
dtype:
import pandas as pd
dob = pd.Series(['1954-01-01', '1981-11-15', '1993-01-21', '1948-01-01',
'1977-01-01', '1968-04-28', '1969-01-01', '1989-01-01',
'1985-01-01'], name='dob')
Out:
0 1954-01-01
1 1981-11-15
2 1993-01-21
3 1948-01-01
4 1977-01-01
5 1968-04-28
6 1969-01-01
7 1989-01-01
8 1985-01-01
Name: dob, dtype: object
Note the dtype that results:
pd.to_datetime(dob)
Out:
0 1954-01-01
1 1981-11-15
2 1993-01-21
3 1948-01-01
4 1977-01-01
5 1968-04-28
6 1969-01-01
7 1989-01-01
8 1985-01-01
dtype: datetime64[ns]
Using this Series as an index in a DataFrame:
baz = list(range(9))
foo = pd.DataFrame(baz, index=pd.to_datetime(dob), columns=['dob'])
You should be able to save your Parquet file now.
from fastparquet import write
write('foo.parquet', foo)
$ls -l foo.parquet
-rw-r--r-- 1 moi admin 854 Oct 13 16:44 foo.parquet
dob
Series has an object dtype and you left unchanged the object_encoding='infer'
argument to fastparquet.write
. So, from the docs:
"The special value 'infer' will cause the type to be guessed from the first ten non-null values."
Fastparquet does not try to infer a date value from what it expects to be one of bytes|utf8|json|bson|bool|int|float
.
Upvotes: 2