Reputation: 35
I'm trying a simple INNER JOIN between two tables in Hive. I have one table of ORDERS and the other one is a LOG table. This is the structure of both:
ORDERS:
id_operacion string
fecha string
id_usuario string
id_producto string
unidades int
id_bono string
precio float
precio_total float
ip string
LOGS:
host STRING
identity STRING
user STRING
time STRING
request STRING
status STRING
size STRING
referer STRING
agent STRING
Not all the 'ip' in the order table have a value, some of them are null.
I would like to get the values order.id_usuario,order.id_producto,logs.host,logs.agent
. In the cases that the order.ip=log.host.
I'm trying this query:
SELECT order.id_producto,order.id_usuario,log.host,log.agent
FROM order JOIN log ON order.ip=log.host;
I don't know why but the query responds me with all the agents the table of logs has, and not only with the ones that match with the IP address of both tables.
I hope I've explained well the issue. Any ideas?
Upvotes: 2
Views: 25511
Reputation: 5307
It sounds like a problem with the join. It could be to do with the nulls. It could also be the data in the tables. Try this?
SELECT order.id_producto,order.id_usuario,log.host,log.agent
FROM order
JOIN log ON order.ip=log.host
WHERE order.ip is not null
Upvotes: 1