ROHIT JOSEPH
ROHIT JOSEPH

Reputation: 11

Snappy insertion into druid

Facing problem in snappy ingestion to druid. Things start break after org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. Its able to fetch the input file.

My specs json file -

{
    "hadoopCoordinates": "org.apache.hadoop:hadoop-client:2.6.0", 
    "spec": {
        "dataSchema": {
            "dataSource": "apps_searchprivacy", 
            "granularitySpec": {
                "intervals": [
                    "2017-01-23T00:00:00.000Z/2017-01-23T01:00:00.000Z"
                ], 
                "queryGranularity": "HOUR", 
                "segmentGranularity": "HOUR", 
                "type": "uniform"
            }, 
            "metricsSpec": [
                {
                    "name": "count", 
                    "type": "count"
                }, 
                {
                    "fieldName": "event_value", 
                    "name": "event_value", 
                    "type": "longSum"
                }, 
                {
                    "fieldName": "landing_impression", 
                    "name": "landing_impression", 
                    "type": "longSum"
                }, 
                 {
                    "fieldName": "user", 
                    "name": "DistinctUsers", 
                    "type": "hyperUnique"
                },
                {
                    "fieldName": "cost", 
                    "name": "cost", 
                    "type": "doubleSum"
                } 
            ], 
            "parser": {
                "parseSpec": {
                    "dimensionsSpec": {
                        "dimensionExclusions": [
                            "landing_page",
                            "skip_url",
                            "ua",
                            "user_id"
                            ], 
                        "dimensions": [
                            "t3",
                            "t2",
                            "t1",
                            "aff_id",
                            "customer",
                            "evt_id",
                            "install_date",
                            "install_week",
                            "install_month",
                            "install_year",
                            "days_since_install",
                            "months_since_install",
                            "weeks_since_install",
                            "success_url",
                            "event",
                            "chrome_version",
                            "value",
                            "event_label",
                            "rand",
                            "type_tag_id",
                            "channel_name",
                            "cid",
                            "log_id",
                            "extension",
                            "os",
                            "device",
                            "browser",
                            "cli_ip",
                            "t4",
                            "t5",
                            "referal_url",
                            "week",
                            "month",
                            "year",
                            "browser_version",
                            "browser_name",
                            "landing_template",
                            "strvalue",
                            "customer_group",
                            "extname",
                            "countrycode",
                            "issp",
                            "spdes",
                            "spsc"                         

                            ],                
                        "spatialDimensions": []
                    }, 
                    "format": "json", 
                    "timestampSpec": {
                        "column": "time_stamp", 
                        "format": "yyyy-MM-dd HH:mm:ss"
                    }
                }, 
                "type": "hadoopyString"
            }
        }, 
        "ioConfig": {
            "inputSpec": {
                "dataGranularity": "hour", 
                "filePattern": ".*\\..*",
                "inputPath": "hdfs://c8-auto-hadoop-service-1.srv.media.net:8020/data/apps_test_output", 
                "pathFormat": "'ts'=yyyyMMddHH", 
                "type": "granularity"
            }, 
            "type": "hadoop"
        }, 
        "tuningConfig": {
            "ignoreInvalidRows": "true",  
            "type": "hadoop", 
            "useCombiner": "false"
        }
    }, 
    "type": "index_hadoop"
}

Error Getting

2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584)
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - mapreduce.task.io.sort.mb: 100
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - soft limit at 83886080
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-02-03T14:39:50,847 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - Starting flush of map output
2017-02-03T14:39:50,849 INFO [Thread-22] org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2017-02-03T14:39:50,850 WARN [Thread-22] org.apache.hadoop.mapred.LocalJobRunner - job_local233667772_0001
java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.6.0.jar:?]
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.6.0.jar:?]
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
    at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) ~[hadoop-common-2.6.0.jar:?]
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) ~[hadoop-common-2.6.0.jar:?]
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:192) ~[hadoop-common-2.6.0.jar:?]
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) ~[hadoop-common-2.6.0.jar:?]
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:90) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
    at org.apache.hadoop.mapreduce.lib.input.DelegatingRecordReader.initialize(DelegatingRecordReader.java:84) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.6.0.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_121]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_121]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
    at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_121]
2017-02-03T14:39:51,130 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local233667772_0001 failed with state FAILED due to: NA
2017-02-03T14:39:51,139 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 0
2017-02-03T14:39:51,143 INFO [task-runner-0-priority-0] io.druid.indexer.JobHelper - Deleting path[var/druid/hadoop-tmp/apps_searchprivacy/2017-02-03T143903.262Z_bb7a812bc0754d4aabcd4bc103ed648a]
2017-02-03T14:39:51,158 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_apps_searchprivacy_2017-02-03T14:39:03.257Z, type=index_hadoop, dataSource=apps_searchprivacy}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
    at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.jar:0.9.2]
    at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.2.jar:0.9.2]
    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.jar:0.9.2]
    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.jar:0.9.2]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
    at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
    ... 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
    at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
    at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
    at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.2.jar:0.9.2]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
    at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
    ... 7 more
2017-02-03T14:39:51,165 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_apps_searchprivacy_2017-02-03T14:39:03.257Z] status changed to [FAILED].
2017-02-03T14:39:51,168 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_apps_searchprivacy_2017-02-03T14:39:03.257Z",
  "status" : "FAILED",
  "duration" : 43693
}

Upvotes: 1

Views: 438

Answers (1)

Yury Nevinitsin
Yury Nevinitsin

Reputation: 158

It seems that jvm can't load native shared library (like .dll or .so), check is it available on machine(s) running the task, and if so check is its dir on the classpath of the jvm.

Upvotes: 0

Related Questions