Harsh Kishore
Harsh Kishore

Reputation: 11

MS Azure Data Factory ADF Copy Activity from BLOB to Azure Postgres Gen5 8 cores fails with connection closed by host error

I am using ADF copy acivity to copy files on azure blob to azure postgres.. im doing recursive copy i.e. there are multiple files withing the folder.. thats fine.. size of 5 files which i have to copy is total around 6 gb. activity fails after 30-60 min of run. used write batch size from 100- 500 but still fails. used 4 or 8 orauto DIUS, similarly tried used 1,2,4,8 or auto parallel connections to postgres.normally it seems it uses 1 per source file. azure postgres server has 8 cores and temp buffer size is 8192 kb. max allowed is 16000 something kb. even tried using that but 2 errors which i have been constantly getting. ms support team suggested to use retry option. still awaiting response from there pg team if i get something but below r the errors.

Answer: { 'errorCode': '2200', 'message': ''Type=Npgsql.NpgsqlException,Message=Exception while reading from stream,Source=Npgsql,''Type=System.IO.IOException,Message=Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.,Source=System,''Type=System.Net.Sockets.SocketException,Message=An existing connection was forcibly closed by the remote host,Source=System,'', 'failureType': 'UserError', 'target': 'csv to pg staging data migration', 'details': [] }

or

Operation on target csv to pg staging data migration failed: 'Type=Npgsql.NpgsqlException,Message=Exception while flushing stream,Source=Npgsql,''Type=System.IO.IOException,Message=Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.,Source=System,''Type=System.Net.Sockets.SocketException,Message=An existing connection was forcibly closed by the remote host,Source=System

Upvotes: 1

Views: 1449

Answers (1)

dabears318
dabears318

Reputation: 21

I was also facing this issue recently and contacted our microsoft rep who got back to me with the following update on 2020-01-16:

“This is another issue we found in the driver, we just finished our deployment yesterday to fix this issue by upgrading driver version. Now customer can have up to 32767 columns data in one batch size(which is the limitation in PostgreSQL, we can’t exceed that).

Please let customer make sure that (Write batch size* column size)< 32767 as I mentioned, otherwise they will face the limitation. “

"Column size" refers to the count of columns in the table. The "area" (row write batch size * column count) cannot be greater than 32,767.

I was able to change my ADF write batch size on copy activity to a dynamic formula to ensure optimum batch sizes per table with the following:

@div(32766,length(pipeline().parameters.config)

pipeline().parameters.config refers to an array containing information about columns for the table. the length of the array = number of columns for table.

hope this helps! I was able to populate the database (albeit slowly) via ADF... would much prefer a COPY based method for better performance.

Upvotes: 2

Related Questions