ZCoder
ZCoder

Reputation: 2349

Error copying on-Premise SQL Data as Parquet

At the moment unable to copy data from an on-premise SQL Server through Integration Runtime in parquet format to Azure Blob Storage using ADF V2 Copy Activity. Latest JRE installed on IR machine. Getting this error:

{ 
"errorCode": "2200", 
"message": "Failure happened on 'Sink' side. ErrorCode=UserErrorJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.UnsatisfiedLinkError:no snappyjava in java.library.path\ntotal entry:18\r\njava.lang.ClassLoader.loadLibrary(Unknown Source)\r\njava.lang.Runtime.loadLibrary0(Unknown Source)\r\njava.lang.System.loadLibrary(Unknown Source)\r\norg.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:170)\r\norg.xerial.snappy.SnappyLoader.load(SnappyLoader.java:145)\r\norg.xerial.snappy.Snappy.<clinit>(Snappy.java:47)\r\norg.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)\r\norg.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)\r\norg.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)\r\norg.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)\r\norg.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:89)\r\norg.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:152)\r\norg.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:240)\r\norg.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:126)\r\norg.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:164)\r\norg.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)\r\norg.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.close(ParquetWriterBridge.java:29)\r\n,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'", 
"failureType": "UserError", 
"target": "CopyMetDBTableToBlob" 
}

Have tested copying data from on-premise Oracle and Informix to Azure Blob Storage in Parquet format using ASDF V2 Copy activity and it works. Just having issue with in-premise SQL Server

Upvotes: 0

Views: 1601

Answers (2)

gip
gip

Reputation: 103

I ran into similar issue. Turns out, copy to parquet fails if source table has any column names with spaces.

So before the copy operation, i check to see if any of column names have spaces. If yes, instead of specifying table name in dataset, use a query and write select statement with aliases for column names without any spaces.

Upvotes: 0

Martin Esteban Zurita
Martin Esteban Zurita

Reputation: 3209

I dont know if you have already checked this out, but there is a section about using parquet file format with an on premise IR. https://learn.microsoft.com/en-us/azure/data-factory/format-parquet#using-self-hosted-integration-runtime

I wouldn't recommend using parquet with data factory, as it doesn't split the dataset into different values of a column (like Python does, for example). I've also had problems when uploading a big dataset (30 gb or more) in this format, it always seemed kind of buggy to me.

I'd always go for compressed csv unless I have no choice.

Hope this helped!

Upvotes: 0

Related Questions