Reputation: 75
I'd like to implement a join with regex/rlike condition. But Hive doesn't do inequality joins
select a.col_1, b.col_2
from table1 a left join table2 b
on a.col_1 rlike b.col_2
This actually works, but I want to match the full text in b.col2 to a string in a.col_1. Is there a way to do this ?
example dataset:
**table1**
apple iphone
apple iphone 6s
google nexus
samsung galaxy tab
**table2**
apple
google
nexus
**outcome**
col1 col2
apple iphone apple
apple iphone 6s apple
google nexus google
samsung galaxy tab null
Upvotes: 1
Views: 781
Reputation: 44991
select col1
,col2
from (select t1.col1
,t2.col2
,count (col2) over (partition by col1) as count_col2
,row_number () over (partition by col1,col2) as rn
from (select *
from table1 t1
lateral view explode(split(col1,'\\s+')) e as token
) t1
left join (select *
from table2 t2
lateral view explode(split(col2,'\\s+')) e as token
) t2
on t2.token =
t1.token
) t
where ( count_col2 = 0
or col1 rlike concat ('\\b',col2,'\\b')
)
and rn = 1
;
+--------------------+--------+
| col1 | col2 |
+--------------------+--------+
| apple iphone | apple |
| apple iphone 6s | apple |
| google nexus | google |
| google nexus | nexus |
| samsung galaxy tab | (null) |
+--------------------+--------+
Upvotes: 1