rev
rev

Reputation: 215

Hive - Checking if an array in each row of a table contains any matching data in a column in another table

I have two tables with columns as shown below.

Posts:
user STRING,
tag_list ARRAY<STRING>

Tags:
tag STRING

Here is some sample data in those tables.

Posts:
user1   help, pig
user2   bigdata, hadoop, query, hiveql
user1   hive, hiveql, help

Tags:
hadoop
hiveql

If I want to filter out only the rows in Posts table that contain the tags listed in Tags table as shown in the results below, how would I do that in a hive query? I am not sure as to how to join these two tables using array_contains.

Expected Query Result:
user2   bigdata, hadoop, query, hiveql
user1   hive, hiveql, help

Thanks

Upvotes: 5

Views: 6827

Answers (1)

rev
rev

Reputation: 215

I was able to get the expected result with this query.

select distinct b.user, b.tag_list from 
(select tag from tags) a
join
(select user, tag_list, exp from posts
lateral view explode(tag_list) exploded_table as exp) b
on (a.tag = b.exp);

Upvotes: 5

Related Questions