conetfun
conetfun

Reputation: 1615

Getting the number of records read from Cassandra table while using Spark Cassandra Connector

While writing to Cassandra table, I get below information on number of rows written and time taken. From log, I can see it is coming from TableWriter class. How can I find same information while reading from Cassandra without calling an action on RDD? I am not sure which method is use to read.

2020-04-20 11:58:42 INFO  com.datastax.spark.connector.writer.TableWriter.logInfo:35 - Wrote 24 rows to my_keyspace.mytable in 0.153 s.


Code to write spark dataframe to Cassandra table

myDF.write
  .format("org.apache.spark.sql.cassandra")
  .mode(saveMode)
  .options(Map("keyspace" -> "my_keyspace", "table" -> "my_table"))
  .save()

Code to read Cassandra table into spark RDD

val cassandraRDD = sparkSession.read
      .format("org.apache.spark.sql.cassandra")
      .options(Map( "table" -> "my_table", "keyspace" -> "my_keyspace", "pushdown" -> "true"))
      .load()

Upvotes: 0

Views: 475

Answers (1)

conetfun
conetfun

Reputation: 1615

I was able to get the metrics during read too. The difference is in logging level of these metrics. At the time of write, log level is INFO and that's why I was able to find this information in logs. But during write, these metrics are logged with DEBUG level. I changed spark logging level to DEBUG and was able to see these metrics.

Reference - https://community.datastax.com/questions/3512/getting-the-number-of-records-read-from-cassandra.html

Upvotes: 0

Related Questions