zsad512
zsad512

Reputation: 881

Cloudera Sqoop Import SQL query trouble "where" clause

I have a db in cloudera. Using two of the tables, I am trying to find the account records that have only 1 device recorded in the accountdevice table. To do this, I have generated the following query:

[training@localhost ~]$ sqoop import -P \
> --connect jdbc:mysql://localhost/loudacre \
> --username training \
> --target-dir /ZXS107020/loudacre/pset1 \
> --split-by accounts.acct_num \
> --query 'SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) WHERE $CONDITIONS AND count(accountdevice.account_id) = 1'

However, this doesnt work and produces the following message:

18/02/02 07:13:16 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Invalid use of group function
java.sql.SQLException: Invalid use of group function
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2030)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:753)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
    at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:234)
    at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:304)
    at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
18/02/02 07:13:16 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

The tables I am working with are structured as follows:

mysql> describe accountdevice;
+-------------------+--------------+------+-----+---------+----------------+
| Field             | Type         | Null | Key | Default | Extra          |
+-------------------+--------------+------+-----+---------+----------------+
| id                | int(11)      | NO   | PRI | NULL    | auto_increment |
| account_id        | int(11)      | NO   | MUL | NULL    |                |
| device_id         | int(11)      | NO   | MUL | NULL    |                |
| activation_date   | datetime     | NO   |     | NULL    |                |
| account_device_id | varchar(255) | NO   |     | NULL    |                |
+-------------------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

mysql> describe accounts;
+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| acct_num       | int(11)      | NO   | PRI | NULL    |       |
| acct_create_dt | datetime     | NO   |     | NULL    |       |
| acct_close_dt  | datetime     | YES  |     | NULL    |       |
| first_name     | varchar(255) | NO   |     | NULL    |       |
| last_name      | varchar(255) | NO   |     | NULL    |       |
| address        | varchar(255) | NO   |     | NULL    |       |
| city           | varchar(255) | NO   |     | NULL    |       |
| state          | varchar(255) | NO   |     | NULL    |       |
| zipcode        | varchar(255) | NO   |     | NULL    |       |
| phone_number   | varchar(255) | NO   |     | NULL    |       |
| created        | datetime     | NO   |     | NULL    |       |
| modified       | datetime     | NO   |     | NULL    |       |
+----------------+--------------+------+-----+---------+-------+

The query that I am trying to run is this: select the account information for clients that have exactly one device registered.

What am I doing wrong? I have tried using 'WHERE $CONDITIONS AND' as well as using "WHERE \$CONDITIONS"

Upvotes: 0

Views: 682

Answers (1)

roh
roh

Reputation: 1053

I would suggest running the query in MySQL first which will give you a validation that query is working fine. I think there is something wrong with the query.

The other problem you have to consider is The sqoop import --query option is intended to process a single statement and there are warnings about using complex queries.

From the sqoop Doc :

The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.

Suggestion 1 : Run the query in mysql and push the data to a new table in mysql and import the records with sqoop from the new table.

Suggestion 2 : Create a stored procedure in mysql with the complex query you want to run and call it through the --query option in sqoop import. Something like below:

-- creating the stored procedure in my sql
mysql> CREATE PROCEDURE simpleprocforimport (OUT param1 INT)
    -> BEGIN
    ->   SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) AND count(accountdevice.account_id) = 1;
    -> END//

#From the sqoop import just call the procedure as below
sqoop import -P \
--connect jdbc:mysql://localhost/loudacre \
--username training \
--target-dir /ZXS107020/loudacre/pset1 \
--split-by accounts.acct_num \
--query "CALL simpleprocforimport (@a);"

I'm not in mysql Env to test out, but let me know if you ran into any issues.

Upvotes: 1

Related Questions