user2014111
user2014111

Reputation: 843

How to read Hive table with column with JSON strings?

I have a hive table column (Json_String String) it has some 1000 rows, Where each row is a Json of same structure. I am trying read the json in to Dataframe as below

val df = sqlContext.read.json("select Json_String from json_table") 

but it is throwing up the below exception

java.io.IOException: No input paths specified in job

is there any way to read all the rows in to dataframe as we do with Json files using wild card

val df = sqlContext.read.json("file:///home/*.json")

Upvotes: 0

Views: 2241

Answers (1)

Jacek Laskowski
Jacek Laskowski

Reputation: 74779

I think what you're asking for is to read the Hive table as usual and transform the JSON column using from_json function.

from_json(e: Column, schema: StructType): Column Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

Given you use sqlContext in your code, I'm afraid that you use Spark < 2.1.0 which then does not offer from_json (which was added in 2.1.0).

The solution then is to use a custom user-defined function (UDF) to do the parsing yourself.

val df = sqlContext.read.json("select Json_String from json_table")

The above won't work since json operator expects a path or paths to JSON files on disk (not as a result of executing a query against a Hive table).

json(paths: String*): DataFrame Loads a JSON file (JSON Lines text format or newline-delimited JSON) and returns the result as a DataFrame.

Upvotes: 1

Related Questions