Apache spark does not giving correct output

Question

I am a beginner and want to learn about spark. I am working with spark-shell and doing some experiment to get fast results I want to get the results from the spark worker nodes.

I have total two machines and in that, I have a driver and one worker on a single machine and one another worker on the other machine.

when I am want to get the count the result is not from both nodes. I have a JSON file to read and doing some performance checking.

here is the code :

spark-shell --conf spark.sql.warehouse.dir=C:\spark-warehouse --master spark://192.168.0.31:7077
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val dfs = sqlContext.read.json("file:///C:/order.json")
dfs.count

I have the order.JSON file is distributed on both machines. but then also I am getting different output

Apache spark does not giving correct output

Answers (1)

Related Questions