Reputation: 574
I'm experiencing an issue where the h2o.H2OFrame([1,2,3])
command is creating a frame within h2o on an internal backend, but not on an external backend. Instead, the connection is not terminating (the frame is being created but the process hangs).
It would appear that a post to /3/ParseSetup
is not returning (where urllib3
seems to get stuck). More specifically, from the h2o logs for a connection to the external backend, an example of this is (where I've shortened the date and IP):
* 10.*.*.15:56565 8120 #7003-141 INFO: Reading byte InputStream into Frame:
* 10.*.*.15:56565 8120 #7003-141 INFO: frameKey: upload_8a440dcf457c1e5deacf76a7ac1a4955
* 10.*.*.15:56565 8120 #7003-141 DEBUG: write-lock upload_8a440dcf457c1e5deacf76a7ac1a4955 by job null
* 10.*.*.15:56565 8120 #7003-141 INFO: totalChunks: 1
* 10.*.*.15:56565 8120 #7003-141 INFO: totalBytes: 21
* 10.*.*.15:56565 8120 #7003-141 DEBUG: unlock upload_8a440dcf457c1e5deacf76a7ac1a4955 by job null
* 10.*.*.15:56565 8120 #7003-141 INFO: Success.
* 10.*.*.15:56565 8120 #7003-135 INFO: POST /3/ParseSetup, parms: {source_frames=["upload_8a440dcf457c1e5deacf76a7ac1a4955"], check_header=1, separator=44}
By comparison, the internal backend completes that call and the log files contain:
** 10.*.*.15:54444 2421 #0581-148 INFO: totalBytes: 21
** 10.*.*.15:54444 2421 #0581-148 INFO: Success.
** 10.*.*.15:54444 2421 #0581-149 INFO: POST /3/ParseSetup, parms: {source_frames=["upload_b985730020211f576ef75143ce0e43f2"], check_header=1, separator=44}
** 10.*.*.15:54444 2421 #0581-150 INFO: POST /3/Parse, parms: {number_columns=1, source_frames=["upload_b985730020211f576ef75143ce0e43f2"], column_types=["Numeric"], single_quotes=False, parse_type=CSV, destination_frame=Key_Frame__upload_b985730020211f576ef75143ce0e43f2.hex, column_names=["C1"], delete_on_done=True, check_header=1, separator=44, blocking=False, chunk_size=4194304}
There is a difference in the by job null
lock that occurs, but it is released, so I suspect that it is not a critical issue. I've curled that endpoint unsuccessfully on both backends, and am reviewing the source code to determine why.
I am able to view the uploaded frame running
, despite the hanging process, and I'm able to retrieve the frame using h2o.get_frame(frame_id="myframe_id")
on the external backend.
I've tried/confirmed the following things:
./ cdh5.14
, which gave me the h2odriver-sw2.3.0-cdh5.14-extended.jar
jar; hadoop jar h2odriver-sw2.3.0-cdh5.14-extended.jar -Dmapreduce.job.queuename=root.users.myuser -jobname extback -baseport 56565 -nodes 10 -mapperXmx 10g -network 10.*.*.0/24
works; sdf = session.createDataFrame([
('a', 1, 1.0), ('b', 2, 2.0)],
schema=StructType([StructField("string", StringType()),
StructField("int", IntegerType()),
StructField("float", FloatType())]))
From a YARN point of view, I attempted client and cluster mode submissions of the simple test app:
spark2-submit --master yarn --deploy-mode cluster --queue root.users.myuser --conf 'spark.ext.h2o.client.port.base=65656'
and without --master yarn
and --deploy-mode cluster
for the default client mode.
Lastly, the
code is:
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pysparkling import *
import h2o
conf = SparkConf().setAll([
('spark.ext.h2o.client.verbose', True),
('spark.ext.h2o.client.log.level', 'DEBUG'),
('spark.ext.h2o.node.log.level', 'DEBUG'),
('spark.ext.h2o.client.port.base', '56565'),
('spark.ext.h2o.backend.cluster.mode', 'external')])
session = SparkSession.builder.config(conf=conf).getOrCreate()
conf = H2OConf(session).set_external_cluster_mode().use_manual_cluster_start().set_h2o_cluster(ip_addr, port).set_cloud_name("extback")
hc = H2OContext.getOrCreate(session, conf)
Does anyone know why it may be hanging (in comparison to the internal backend), what I'm doing wrong, or which steps I can take to better debug this? Thanks!
Upvotes: 0
Views: 244