Omkar
Omkar

Reputation: 2423

Saving contents of df.show() as a string in spark-scala app

I need to save the output of df.show() as a string so that i can email it directly.

For ex., the below example taken from official spark docs,:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

I need to save the above table as a string which is printed in the console. I did look at log4j to print the log, but couldnt come across any info on logging only the output.

Can someone help me with it?

Upvotes: 16

Views: 6306

Answers (2)

Joe K
Joe K

Reputation: 18424

scala.Console has a withOut method for this kind of thing:

val outCapture = new ByteArrayOutputStream
Console.withOut(outCapture) {
  df.show()
}
val result = new String(outCapture.toByteArray)

Upvotes: 26

T. Gawęda
T. Gawęda

Reputation: 16076

Workaround is to redirect standard output to variable:

val baos = new java.io.ByteArrayOutputStream();
val ps =  new java.io.PrintStream(baos);

val oldPs = Console.out
Console.setOut(ps)
df.show()
val content = baos.toString()
Console.setOut(oldPs)

Note that I have one deprecation warning here.

You can also re-implement method Dataset.showString, which generated data. It uses take in background. Maybe it's also a good moment to create PR to make showString public? :)

Upvotes: 6

Related Questions