Reputation: 2349
At the moment unable to copy data from an on-premise SQL Server through Integration Runtime in parquet format to Azure Blob Storage using ADF V2 Copy Activity. Latest JRE installed on IR machine. Getting this error:
{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=UserErrorJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.UnsatisfiedLinkError:no snappyjava in java.library.path\ntotal entry:18\r\njava.lang.ClassLoader.loadLibrary(Unknown Source)\r\njava.lang.Runtime.loadLibrary0(Unknown Source)\r\njava.lang.System.loadLibrary(Unknown Source)\r\norg.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:170)\r\norg.xerial.snappy.SnappyLoader.load(SnappyLoader.java:145)\r\norg.xerial.snappy.Snappy.<clinit>(Snappy.java:47)\r\norg.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)\r\norg.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)\r\norg.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)\r\norg.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)\r\norg.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:89)\r\norg.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:152)\r\norg.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:240)\r\norg.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:126)\r\norg.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:164)\r\norg.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)\r\norg.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.close(ParquetWriterBridge.java:29)\r\n,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "CopyMetDBTableToBlob"
}
Have tested copying data from on-premise Oracle and Informix to Azure Blob Storage in Parquet format using ASDF V2 Copy activity and it works. Just having issue with in-premise SQL Server
Upvotes: 0
Views: 1601
Reputation: 103
I ran into similar issue. Turns out, copy to parquet fails if source table has any column names with spaces.
So before the copy operation, i check to see if any of column names have spaces. If yes, instead of specifying table name in dataset, use a query and write select statement with aliases for column names without any spaces.
Upvotes: 0
Reputation: 3209
I dont know if you have already checked this out, but there is a section about using parquet file format with an on premise IR. https://learn.microsoft.com/en-us/azure/data-factory/format-parquet#using-self-hosted-integration-runtime
I wouldn't recommend using parquet with data factory, as it doesn't split the dataset into different values of a column (like Python does, for example). I've also had problems when uploading a big dataset (30 gb or more) in this format, it always seemed kind of buggy to me.
I'd always go for compressed csv unless I have no choice.
Hope this helped!
Upvotes: 0