Reputation: 3
I have been trying to execute below statements in SPARK with master as YARN Cluster but its resulting in nothing. Whereas if I execute the same in Local, its running without any issue. Can someone suggest what's wrong in here?
In this process, input is HDFS directory with AVRO files
val rdd = sc.newAPIHadoopFile(inAvro,
classOf[AvroKeyInputFormat[PolicyTransaction]],
classOf[AvroKey[PolicyTransaction]],
classOf[NullWritable],
sc.hadoopConfiguration
)
println(rdd.count())// This works with Local and Cluster
val avroObj = rdd.map(record => {
Try
{
val combineRecords = new PolicyTransaction
println(record._1.datum().getPolicyNumber)// This doesn't work with Local and Cluster
combineRecords.setPolicyNumber(record._1.datum().getPolicyNumber)
combineRecords.setLOBCd(record._1.datum().getLOBCd)
combineRecords.setPolicyVersion(record._1.datum().getPolicyVersion)
combineRecords.setStatStateProvCd(record._1.datum().getStatStateProvCd)
combineRecords.setTransactionEffectiveDt(record._1.datum().getTransactionEffectiveDt)
combineRecords.setTransactionProcessedDt(record._1.datum().getTransactionProcessedDt)
combineRecords.setQuoteOrPolicy(record._1.datum().getQuoteOrPolicy.get(0))
combineRecords
}
match
{
case Success(map) => Right(map)
case Failure(map) => Left(map)
}
}
).cache()
Upvotes: 0
Views: 74
Reputation: 13154
So, as I put in a comment - I assume that you are referring to the print-statement not being printed and if so, you are making the classic mistake of forgetting that the println
command is executed on your workers - not on your driver - and thus you will not see it printed on your driver. Have a look through the log files of the workers and you will see your print-statements ;-)
Upvotes: 1
Reputation: 563
I think you're looking in the wrong place! The println will print to the local output. So, since the first print is running on the driver, you will probably see that on the console you launch from. However, inside the map this is not true, the code is running somewhere on your cluster. You need to go check the local logs.
Alternatively, stop using println, and use a logger instead. This will log everything to a shared log environment. You still won't see this in cluster mode, but at least everything will be collated centrally somewhere for you, saving you the job of figuring out where the code is running.
Upvotes: 0