Reputation:
I am trying to import a table from a database present in MySQL to HDFS using Sqoop. The table has fields col1, col2, col3, col4, and col5. There are lot of NULL values in them. Count of rows with NULL values is 4265550, and count of rows without NULL values is 1421851. I want to block NULL values while importing table into HDFS. Following is my code,
sqoop import --connect "jdbc:mysql://quickstart.cloudera:3306/database" --username root --password cloudera --query 'SELECT * FROM table_export WHERE ((col1 IS NOT NULL) OR (col2 IS NOT NULL) OR (col3 IS NOT NULL) OR (col4 IS NOT NULL) OR (col5 IS NOT NULL) AND $CONDITIONS)' --target-dir /user/cloudera/input/table_export --direct -m 1
But when I run this query, NULL values are imported too. Where did I go wrong?
If I need to use "--where conditions",
Case1:
--where (col1 IS NOT NULL) OR (col2 IS NOT NULL) OR (col3 IS NOT NULL) OR (col4 IS NOT NULL) OR (col5 IS NOT NULL)
Case2:
--where (col1 != NULL) OR (col2 != NULL) OR (col3 != NULL) OR (col4 != NULL) OR (col5 != NULL)
Which one should I use? Please do help me. Thanks in advance
Upvotes: 0
Views: 400
Reputation: 469
You can use Case1.
if you don't want any null value in the output, please change OR to AND in the query you pasted.
Upvotes: 0
Reputation: 38290
You should use Case1.
--where "(col1 IS NOT NULL) OR (col2 IS NOT NULL) OR (col3 IS NOT NULL) OR (col4 IS NOT NULL) OR (col5 IS NOT NULL)"
because the result of (col1 != NULL)
is undefined. NULL
cannot be equal or not equal to another value or NULL
. See also this answer: https://stackoverflow.com/a/41025180/2700344
Upvotes: 1