Reputation: 1615
While writing to Cassandra table, I get below information on number of rows written and time taken. From log, I can see it is coming from TableWriter class. How can I find same information while reading from Cassandra without calling an action on RDD? I am not sure which method is use to read.
2020-04-20 11:58:42 INFO com.datastax.spark.connector.writer.TableWriter.logInfo:35 - Wrote 24 rows to my_keyspace.mytable in 0.153 s.
Code to write spark dataframe to Cassandra table
myDF.write
.format("org.apache.spark.sql.cassandra")
.mode(saveMode)
.options(Map("keyspace" -> "my_keyspace", "table" -> "my_table"))
.save()
Code to read Cassandra table into spark RDD
val cassandraRDD = sparkSession.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "my_table", "keyspace" -> "my_keyspace", "pushdown" -> "true"))
.load()
Upvotes: 0
Views: 475
Reputation: 1615
I was able to get the metrics during read too. The difference is in logging level of these metrics. At the time of write, log level is INFO and that's why I was able to find this information in logs. But during write, these metrics are logged with DEBUG level. I changed spark logging level to DEBUG and was able to see these metrics.
Reference - https://community.datastax.com/questions/3512/getting-the-number-of-records-read-from-cassandra.html
Upvotes: 0