Jason
Jason

Reputation: 53

Correct usage of a struct in Hive

I want to use a struct which contains a string and an array. The string represents the name of a service, while the array lists the company ids which carry out that service.

I know that I can create and insert into a struct look so:

CREATE TABLE struct_test
(
 property_id INT,
 service STRUCT<
                type: STRING
               ,provider: ARRAY<INT>
               >
);

INSERT INTO TABLE struct_test 
    SELECT 989, NAMED_STRUCT('type','Cleaning','provider', ARRAY(587, 887)) AS address 
    FROM tmp LIMIT 1;

This gives me the following:

>{"type":"Cleaning","provider":[587,887]}

However, I want to account for multiple service types on the same property. How could I have more than one type in my service struct?

Ideally, I would want to achieve something similar to the below:

{"type":"Cleaning","provider":[587,887]}, {"type":"Pricing","provider":[932]}, {"type":"Security","provider":[187,577,946]}

This way I can store multiple services in the column and the providers who provide that service. Can this model be achieved using a struct?

Upvotes: 5

Views: 11050

Answers (1)

leftjoin
leftjoin

Reputation: 38290

This achievable with array<struct<type: STRING, provider: ARRAY<INT>>>

Create table:

CREATE TABLE struct_test
(
 property_id INT,
 service array<STRUCT<
                type: STRING
               ,provider: ARRAY<INT>
               >>
);

Insert data:

with 
test_data as(
    SELECT 989 property_id, array(NAMED_STRUCT('type','Cleaning','provider', ARRAY(587, 887)),
                      NAMED_STRUCT('type','Pricing','provider', ARRAY(932))
                      ) as service
)

INSERT INTO TABLE struct_test 
select * from test_data;

Check data:

select t.property_id, t.service from struct_test t;

Result:

OK
property_id     service
989     [{"type":"Cleaning","provider":[587,887]},{"type":"Pricing","provider":[932]}]
Time taken: 0.064 seconds, Fetched: 1 row(s)

And if you want collect array of structs in the select rather than hardcoded values, see this answer: https://stackoverflow.com/a/48175749/2700344

For such a complex data type you definitely need brickhouse library

Upvotes: 4

Related Questions