Reputation: 881
I have a db in cloudera. Using two of the tables, I am trying to find the account
records that have only 1 device recorded in the accountdevice
table. To do this, I have generated the following query:
[training@localhost ~]$ sqoop import -P \
> --connect jdbc:mysql://localhost/loudacre \
> --username training \
> --target-dir /ZXS107020/loudacre/pset1 \
> --split-by accounts.acct_num \
> --query 'SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) WHERE $CONDITIONS AND count(accountdevice.account_id) = 1'
However, this doesnt work and produces the following message:
18/02/02 07:13:16 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Invalid use of group function
java.sql.SQLException: Invalid use of group function
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2030)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:753)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:234)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:304)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
18/02/02 07:13:16 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
The tables I am working with are structured as follows:
mysql> describe accountdevice;
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| account_id | int(11) | NO | MUL | NULL | |
| device_id | int(11) | NO | MUL | NULL | |
| activation_date | datetime | NO | | NULL | |
| account_device_id | varchar(255) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> describe accounts;
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| acct_num | int(11) | NO | PRI | NULL | |
| acct_create_dt | datetime | NO | | NULL | |
| acct_close_dt | datetime | YES | | NULL | |
| first_name | varchar(255) | NO | | NULL | |
| last_name | varchar(255) | NO | | NULL | |
| address | varchar(255) | NO | | NULL | |
| city | varchar(255) | NO | | NULL | |
| state | varchar(255) | NO | | NULL | |
| zipcode | varchar(255) | NO | | NULL | |
| phone_number | varchar(255) | NO | | NULL | |
| created | datetime | NO | | NULL | |
| modified | datetime | NO | | NULL | |
+----------------+--------------+------+-----+---------+-------+
The query that I am trying to run is this: select the account information for clients that have exactly one device registered.
What am I doing wrong? I have tried using 'WHERE $CONDITIONS AND'
as well as using "WHERE \$CONDITIONS"
Upvotes: 0
Views: 682
Reputation: 1053
I would suggest running the query in MySQL first which will give you a validation that query is working fine. I think there is something wrong with the query.
The other problem you have to consider is The sqoop import --query
option is intended to process a single statement and there are warnings about using complex queries.
From the sqoop Doc :
The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.
Suggestion 1 : Run the query in mysql and push the data to a new table in mysql and import the records with sqoop from the new table.
Suggestion 2 : Create a stored procedure in mysql with the complex query you want to run and call it through the --query
option in sqoop import. Something like below:
-- creating the stored procedure in my sql
mysql> CREATE PROCEDURE simpleprocforimport (OUT param1 INT)
-> BEGIN
-> SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) AND count(accountdevice.account_id) = 1;
-> END//
#From the sqoop import just call the procedure as below
sqoop import -P \
--connect jdbc:mysql://localhost/loudacre \
--username training \
--target-dir /ZXS107020/loudacre/pset1 \
--split-by accounts.acct_num \
--query "CALL simpleprocforimport (@a);"
I'm not in mysql Env to test out, but let me know if you ran into any issues.
Upvotes: 1