Bab
Bab

Reputation: 191

Spark Scala performance issues with take and InsertInto commands

Please look at the attached screenshot.

I am trying to do some performance improvement to my spark job and its taking almost 5 min to execute the take action on dataframe.

I am using take for making sure that dataframe has some records in it and if it is present, I want to proceed for further processing.

I tried take and count and don't see much difference in the time for execution.

Another scenario is where its taking around 10min to write the datafrane into hive table(it has max 200 rows and 10 columns).

df.write.mode("append").partitionBy("date").insertInto(tablename)

Please suggest how we can minimize the time its taking for take and insert into hive table.

enter image description here

Updates:

Here is my spark submit : spark-submit --master yarn-cluster --class com.xxxx.info.InfoAssets --conf "spark.executor.extraJavaOptions=-XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Djava.security.auth.login.config=kafka_spark_jaas.conf" --files /home/ngap.app.rcrp/hive-site.xml,/home//kafka_spark_jaas.conf,/etc/security/keytabs/ngap.sa.rcrp.keytab --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar --executor-memory 3G --num-executors 3 --executor-cores 10 /home/InfoAssets/InfoAssets.jar

its a simple dataframe which has 8 columns with around 200 records in it and I am using following code to insert into hive table.

df.write.mode("append").partitionBy("partkey").insertInto(hiveDB + "." + tableName)

Thanks,Bab

Upvotes: 0

Views: 992

Answers (1)

Subhasish Guha
Subhasish Guha

Reputation: 232

Don't use count before the write if not necessary and if your table is already created then use Spark SQL to insert the data into Hive Partitioned table.

spark.sql("Insert into <tgt tbl> partition(<col name>) select cols,partition col from temp_tbl")

Upvotes: 0

Related Questions