java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.Text. Error with json serde

Question

I am new to working with json data on hive. I am working on a spark application that gets json data and stores it into hive tables. I have a json like this:

which looks like this when expanded:

I am able to read the json into a dataframe and save it in a location on HDFS. But getting hive to be able to read the data is the tough part.

After I've searched online for example, I've tried to do this:

using the STRUCT for all the json fields and then access the elements using column.element.

For example:

web_app_security will be the name of a column(of type STRUCT) inside the table and the other jsons in it like config_web_cms_authentication, web_threat_intel_alert_external will also be Structs(with rating and rating_numeric as the fields).

I tried creating the table with json serde. here is my table definition:

CREATE EXTERNAL TABLE jsons (
web_app_security struct, web_threat_intel_alert_external: struct, web_http_security_headers: struct, rating: string, rating_numeric: float>,
dns_security struct, rating: string, rating_numeric: float, dns_hosting_providers: struct>,
email_security struct, rating_numeric: float, email_hosting_providers: struct, email_authentication: struct>,
threat_intell struct, threat_intel_alert_internal_1: struct, rating_numeric: float,  threat_intel_alert_internal_12: struct, threat_intel_alert_internal_6: struct>,
data_loss struct, rating: string, data_loss_36plus: struct, rating_numeric: float,  data_loss_36: struct, data_loss_12: struct, data_loss_24: struct>,
system_hosting struct,  hosting_countries: struct, rating: string, rating_numeric: float>,
defensibility struct, shared_hosting: struct, defensibility_hosting_providers: struct, rating: string, rating_numeric: float, attack_surface_web_hostname: struct>,
software_patching struct, rating: string, patching_web_server: struct, patching_vuln_open_ssl: struct, patching_app_server: struct, rating_numeric: float>,
governance struct, governance_security_certifications: struct, governance_regulatory_requirements: struct, rating: string, rating_numeric: float>
)ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS orc
LOCATION 'hdfs://nameservice1/data/gis/final/rr_current_analysis'

I've tried to parse the rows with the json serde. After I've saved some data to the table, I get the following error when I try to query it:

Error: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.Text (state=,code=0)

I am not sure if I am doing it the right way.

I am open to any other ways of storing the data into the table as well. Any help would be appreciated. Thank you.

java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.Text. Error with json serde

Answers (1)

Related Questions