third_eye
third_eye

Reputation: 428

Complex Json schema into custom spark dataframe

Ok so I'm getting a big Json string from an API call, and I want to save some of that string into Cassandra. I'm trying to parse the Json string into a more table like structure, but with only some fields. The overall schema looks like this:

printSchema()

And I want my table structure using regnum, date and value fields. With sqlContext.read.json(vals).select(explode('register) as 'reg).select("[email protected]","reg.data.date","reg.data.value").show I can get a table like this:

table

But as you can see date and value fields are arrays. I would like to have one element per record, and duplicate the corresponding regnum for each record. Any help is very much appreciated.

Upvotes: 0

Views: 1097

Answers (1)

Thang Nguyen
Thang Nguyen

Reputation: 1110

You can cast your DataFrame to Dataset then flatMap on it.

 df.select("[email protected]","reg.data.date","reg.data.value")
   .as[(Long, Array[String], Array[String])]
   .flatMap(s => s._2.zip(s._3).map(p => (s._1, p._1, p._2)))

Upvotes: 2

Related Questions