Reputation: 91
I'm getting the following error: Error during parsing. repetition constraint is more restrictive: can not merge type required binary MyTime into optional binary MyTime. Maybe one of the files is corrupted but I don't know how to skip it.
Thanks
Upvotes: 0
Views: 1355
Reputation: 823
This happens when reading multiple parquet files that have slightly different metadata in their schemas. Either you have a mixed collection of files in a single directory or you are giving the LOAD
statement a glob and the resulting collection of files is mixed in this respect.
Rather than specifying the schema in an AS()
clause or making a bare call to the loader function the solution is to override the schema in the loader function's argument like this:
data = LOAD 'data'
USING parquet.pig.ParquetLoader( 'n1:int, n2:float, n3:double, n4:long')
Otherwise the loader function infers the schema from the first file it encounters which then conflicts with one of the others.
If you have still have trouble try using type bytearray
in the schema specification and then cast to the desired types in a subsequent FOREACH
.
According to the Parquet source code there is another argument to the loader function that allows columns to be specified by position rather than name (the default) but I have not experimented with that.
Upvotes: 2