Reputation: 1
I have 5 node Hortonworks cluster(Version - 2.4.2) in which I have installed Hawq 2.0.0.
These 5 nodes are: edge master ( Name node) node1(Data Node1) node2(Data Node2) node3(Data Node3)
I followed this link to install Hawq in HDP - http://hdb.docs.pivotal.io/hdb/install/install-ambari.html
Hawq coomponents are installed in these nodes:
Hawq master - node1 Hawq standy master - node2
Hawq segment - node1,node2,node3
At the time of installation , Hawq master, Hawq standy master , hawq segments were installed successfully but the basic Hawq tests which is run by Hawq installer in Ambari has failed:
Below in the operation performed by Installer
2016-06-30 00:24:22,513 - --- Check state of HAWQ cluster ---
2016-06-30 00:24:22,513 - Executing hawq status check...
2016-06-30 00:24:22,514 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"source /usr/local/hawq/greenplum_path.sh && hawq state -d /data/hawq/master \" "
2016-06-30 00:24:23,343 - Output of command:
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:--HAWQ instance status summary
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:------------------------------------------------------
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Master instance = Active
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Master standby = node2.localdomain
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Standby master state = Standby host passive
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segment instance count from config file = 3
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:------------------------------------------------------
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Segment Status
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:------------------------------------------------------
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segments count from catalog = 1
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segment valid (at master) = 0
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segment failures (at master) = 3
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total number of postmaster.pid files missing = 0
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total number of postmaster.pid files found = 3
2016-06-30 00:24:23,344 - --- Check if HAWQ can write and query from a table ---
2016-06-30 00:24:23,344 - Dropping ambari_hawq_test table if exists
2016-06-30 00:24:23,344 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"export PGPORT=5432 && source /usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\"DROP TABLE IF EXISTS ambari_hawq_test;\\\" \" "
2016-06-30 00:24:23,436 - Output:
DROP TABLE
2016-06-30 00:24:23,436 - Creating table ambari_hawq_test
2016-06-30 00:24:23,436 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"export PGPORT=5432 && source /usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\"CREATE TABLE ambari_hawq_test (col1 int) DISTRIBUTED RANDOMLY;\\\" \" "
2016-06-30 00:24:23,693 - Output:
CREATE TABLE
2016-06-30 00:24:23,693 - Inserting data to table ambari_hawq_test
2016-06-30 00:24:23,693 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"export PGPORT=5432 && source /usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\"INSERT INTO ambari_hawq_test SELECT * FROM generate_series(1,10);\\\" \"
"
--- Above we can see that , the drop and Create table was executed but insert operation didn't succeed.
So, I executed insert command manually on Hawq master node i.e. node1
These are the steps executed manually:
[root@node1 ~]# su - gpadmin
[gpadmin@node1 ~]$ psql
psql (8.4.20, server 8.2.15)
WARNING: psql version 8.4, server version 8.2.
Some psql features might not work.
Type "help" for help.
gpadmin=#
gpadmin=# \c gpadmin
psql (8.4.20, server 8.2.15)
WARNING: psql version 8.4, server version 8.2.
Some psql features might not work.
You are now connected to database "gpadmin".
gpadmin=# create table test(name varchar);
gpadmin=# insert into test values('vikash');
-- The above insert operation thrown an error after a long time as
ERROR: failed to acquire resource from resource manager, resource request is timed out due to no available cluster (pquery.c:804)
Also, the hawq segment logs in node1 is coming as
[root@node1 ambari-agent]# tail -f /data/hawq/segment/pg_log/hawq-2016-06-30_045853.csv
2016-06-30 05:10:24.522688 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.168.122.1"
,,,,,,,0,,"network_utils.c",210,
2016-06-30 05:10:54.603726 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:10:54.603769 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 2.10.1.71",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:10:54.603778 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.168.122.1"
,,,,,,,0,,"network_utils.c",210,
2016-06-30 05:11:24.625919 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:11:24.626088 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 2.10.1.71",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:11:24.626129 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.168.122.1"
,,,,,,,0,,"network_utils.c",210,
I had also tried to check the "gp_segment_configuration"
gpadmin=# select * from gp_segment_configuration
gpadmin-# ;
registration_order | role | status | port | hostname | address | description
--------------------+------+--------+-------+-------------------+-----------+------------------------------------
-1 | s | u | 5432 | node2.localdomain | 2.10.1.72 |
0 | m | u | 5432 | node1 | node1 |
1 | p | d | 40000 | node1.localdomain | 2.10.1.71 | resource manager process was reset
(3 rows)
NOTE : In hawq-site.xml, the Resource management type is selected as "STANDALONE" instead of "YARN" from the dropdown.
Anyone have any clue, what is the issue here ??? Thanks in advance !!!
Upvotes: 0
Views: 323
Reputation: 1
Thanks to you all for your reply.
The underlying OS in centOS and its on vCloud. As suggested, I have gone through IP configurations of all the 3 data nodes holding 3 segments. These nodes were not using same nics(IP). But on investigating further, I found through ifconfig that along with "eth1" & "lo" an another set of config was present under "vibr0" .
This "vibr0" was same in all the segment nodes and this was causing the issue. I removed it from all nodes and then Insert query worked.
Below is the result of ifconfig , and to resolve the issue removed "vibr0" from all the segment nodes.
eth1 Link encap:Ethernet HWaddr 00:50:56:01:31:26 inet addr:2.10.1.74 Bcast:2.10.3.255 Mask:255.255.252.0 inet6 addr: fe80::250:56ff:fe01:3126/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:426157 errors:0 dropped:0 overruns:0 frame:0 TX packets:259592 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:361465764 (344.7 MiB) TX bytes:216951933 (206.9 MiB)
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:416 (416.0 b) TX bytes:416 (416.0 b)
virbr0 Link encap:Ethernet HWaddr 52:54:00:DC:EE:00 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Upvotes: 0
Reputation: 21
I met with such problem before. In such environment, every segment has a common IP address. So please check if the segment nodes has same IP address. For hawq2.0.0, it will consider segment with same IP address as one node, that's why you have 3 segment nodes, but in gp_segment_configuration, there is only one segment node registered. You could remove the duplicate IP address and try again.
This issue has been fixed with latest hawq codes.
Upvotes: 1