Subhadip
Subhadip

Reputation: 51

Scala Spark - creating nested json output from simple dataframe

Thanks for getting back. But the problem I am facing is while writing those structs into nested json. Somehow 'tojson' is not working and is just skipping the nested fields resulting into a flat structure always. How can I write into nested json format into HDFS?

Upvotes: 5

Views: 3650

Answers (1)

BJC
BJC

Reputation: 501

You should be creating the struct fields from the fields which have to be be nested together. Below is an working example : Assume you have a employee data in csv format containing companyname,employee and department name and you would want to list all the employees per department per company in json format. Below is the code for the same.

  import java.util.List;
  import org.apache.spark.sql.Dataset;
  import org.apache.spark.sql.Row;
  import org.apache.spark.sql.RowFactory;
  import org.apache.spark.sql.SparkSession;
  import org.apache.spark.sql.api.java.UDF2;
  import org.apache.spark.sql.types.DataTypes;
  import org.apache.spark.sql.types.StructField;

  import scala.collection.mutable.WrappedArray;
public class JsonExample {
public static void main(String [] args)
 {
    SparkSession sparkSession = SparkSession
              .builder()
              .appName("JsonExample")
              .master("local")
              .getOrCreate();

    //read the csv file
    Dataset<Row> employees = sparkSession.read().option("header", "true").csv("/tmp/data/emp.csv");
    //create the temp view
    employees.createOrReplaceTempView("employees");

    //First , group the employees based on company AND department 
    sparkSession.sql("select company,department,collect_list(name) as department_employees from employees group by company,department").createOrReplaceTempView("employees");
    /*Now create a struct by invoking the UDF create_struct. 
     * The struct will contain department and the list of employees 
    */
    sparkSession.sql("select company,collect_list(struct(department,department_employees)) as department_info from employees group by company").toJSON().show(false);



 }
}

You can find the same example on my blog: http://baahu.in/spark-how-to-generate-nested-json-using-dataset/

Upvotes: 2

Related Questions