Nithin R
Nithin R

Reputation: 41

Sqoop import job fails due to task timeout

I was trying to import a 1 TB table in MySQL to HDFS using sqoop. The command used was:

sqoop import --connect jdbc:mysql://xx.xx.xxx.xx/MyDB --username myuser --password mypass --table mytable --split-by rowkey -m 14

After executing the bounding vals query, all the mappers start, but after some time, the tasks get killed due to timeout (1200 seconds). This, I think, is because the time taken to execute the select query running in each mapper takes more than the time set for timeout (in sqoop it seems to be 1200 seconds); and hence it fails to report status, and the task subsequently gets killed. (I have also tried it for 100 gb data sets; it still failed due to timeout for multiple mappers.) For single mapper import, it works fine, as no filtered resultsets are needed. Is there any way to override the map task timeout (say set it to 0 or a very high value) while using multiple mappers in sqoop?

Upvotes: 3

Views: 7222

Answers (1)

Jarek Jarcec Cecho
Jarek Jarcec Cecho

Reputation: 1726

Sqoop is using special thread to send statuses so that the map task won't get killed by jobtracker. I would be interested to explore your issue further. Would you mind sharing the sqoop log, one of the map task logs and your table schema?

Jarcec

Upvotes: 1

Related Questions