Bin Wang
Bin Wang

Reputation: 2747

Impala memory limit exceeded with simple count query

Edit:

There are some corrupt AVRO files in the table. After remove some of them, every thing works fine. I've de-compress these files to json with avro-tools and the decompressed file is not very large, either. So it seems to be some bug in Impala to handle corrupt AVRO files.


I have an Impala table with gziped AVRO format, which is partitioned by "day". When I execute the query:

select count(0) from adhoc_data_fast.log where day='2017-04-05';

It said:

Query: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Query submitted at: 2017-04-06 13:35:56 (Coordinator: http://szq7.appadhoc.com:25000)
Query progress can be monitored at: http://szq7.appadhoc.com:25000/query_plan?query_id=ef4698db870efd4d:739c89ef00000000
WARNINGS:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.

Each node is configured with 96 GB memory and the single pool memory limit is set to 300 GB.

All the files after compressed is no larger than 250MB:

62M     log.2017-04-05.1491321605834.avro
79M     log.2017-04-05.1491323647211.avro
62M     log.2017-04-05.1491327241311.avro
60M     log.2017-04-05.1491330839609.avro
52M     log.2017-04-05.1491334439092.avro
59M     log.2017-04-05.1491338038503.avro
93M     log.2017-04-05.1491341639694.avro
130M    log.2017-04-05.1491345239969.avro
147M    log.2017-04-05.1491348843931.avro
183M    log.2017-04-05.1491352442955.avro
218M    log.2017-04-05.1491359648079.avro
181M    log.2017-04-05.1491363247597.avro
212M    log.2017-04-05.1491366845827.avro
207M    log.2017-04-05.1491370445873.avro
197M    log.2017-04-05.1491374045830.avro
164M    log.2017-04-05.1491377650935.avro
155M    log.2017-04-05.1491381249597.avro
203M    log.2017-04-05.1491384846366.avro
185M    log.2017-04-05.1491388450262.avro
198M    log.2017-04-05.1491392047694.avro
206M    log.2017-04-05.1491395648818.avro
214M    log.2017-04-05.1491399246407.avro
167M    log.2017-04-05.1491402846469.avro
77M     log.2017-04-05.1491406180615.avro
3.2M    log.2017-04-05.1491409790105.avro
1.3M    log.2017-04-05.1491413385884.avro
928K    log.2017-04-05.1491416981829.avro
832K    log.2017-04-05.1491420581588.avro
1.1M    log.2017-04-05.1491424180191.avro
2.6M    log.2017-04-05.1491427781339.avro
3.8M    log.2017-04-05.1491431382552.avro
3.3M    log.2017-04-05.1491434984679.avro
5.2M    log.2017-04-05.1491438586674.avro
5.1M    log.2017-04-05.1491442192541.avro
2.3M    log.2017-04-05.1491445789230.avro
884K    log.2017-04-05.1491449386630.avro

And I've get them from HDFS and use avro-tools to convert them to json in order to decompress them. The decompressed files are no larger than 1GB:

16M     log.2017-04-05.1491321605834.avro.json
308M    log.2017-04-05.1491323647211.avro.json
103M    log.2017-04-05.1491327241311.avro.json
150M    log.2017-04-05.1491330839609.avro.json
397M    log.2017-04-05.1491334439092.avro.json
297M    log.2017-04-05.1491338038503.avro.json
160M    log.2017-04-05.1491341639694.avro.json
95M     log.2017-04-05.1491345239969.avro.json
360M    log.2017-04-05.1491348843931.avro.json
338M    log.2017-04-05.1491352442955.avro.json
71M     log.2017-04-05.1491359648079.avro.json
161M    log.2017-04-05.1491363247597.avro.json
628M    log.2017-04-05.1491366845827.avro.json
288M    log.2017-04-05.1491370445873.avro.json
162M    log.2017-04-05.1491374045830.avro.json
90M     log.2017-04-05.1491377650935.avro.json
269M    log.2017-04-05.1491381249597.avro.json
620M    log.2017-04-05.1491384846366.avro.json
70M     log.2017-04-05.1491388450262.avro.json
30M     log.2017-04-05.1491392047694.avro.json
114M    log.2017-04-05.1491395648818.avro.json
370M    log.2017-04-05.1491399246407.avro.json
359M    log.2017-04-05.1491402846469.avro.json
218M    log.2017-04-05.1491406180615.avro.json
29M     log.2017-04-05.1491409790105.avro.json
3.9M    log.2017-04-05.1491413385884.avro.json
9.3M    log.2017-04-05.1491416981829.avro.json
8.3M    log.2017-04-05.1491420581588.avro.json
2.3M    log.2017-04-05.1491424180191.avro.json
25M     log.2017-04-05.1491427781339.avro.json
24M     log.2017-04-05.1491431382552.avro.json
5.7M    log.2017-04-05.1491434984679.avro.json
35M     log.2017-04-05.1491438586674.avro.json
5.8M    log.2017-04-05.1491442192541.avro.json
23M     log.2017-04-05.1491445789230.avro.json
4.3M    log.2017-04-05.1491449386630.avro.json

And here is the Impala profiling:

[szq7.appadhoc.com:21000] > profile;
Query Runtime Profile:
Query (id=ef4698db870efd4d:739c89ef00000000):
  Summary:
    Session ID: f54bb090170bcdb6:621ac5796ef2668c
    Session Type: BEESWAX
    Start Time: 2017-04-06 13:35:56.454441000
    End Time: 2017-04-06 13:35:57.326967000
    Query Type: QUERY
    Query State: EXCEPTION
    Query Status:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.

    Impala Version: impalad version 2.7.0-cdh5.9.1 RELEASE (build 24ad6df788d66e4af9496edb26ac4d1f1d2a1f2c)
    User: ubuntu
    Connected User: ubuntu
    Delegated User:
    Network Address: ::ffff:192.168.1.7:29026
    Default Db: default
    Sql Statement: select count(0) from adhoc_data_fast.log where day='2017-04-05'
    Coordinator: szq7.appadhoc.com:22000
    Query Options (non default):
    Plan:
----------------
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log

03:AGGREGATE [FINALIZE]
|  output: count:merge(0)
|  hosts=13 per-host-mem=unavailable
|  tuple-ids=1 row-size=8B cardinality=1
|
02:EXCHANGE [UNPARTITIONED]
|  hosts=13 per-host-mem=unavailable
|  tuple-ids=1 row-size=8B cardinality=1
|
01:AGGREGATE
|  output: count(0)
|  hosts=13 per-host-mem=10.00MB
|  tuple-ids=1 row-size=8B cardinality=1
|
00:SCAN HDFS [adhoc_data_fast.log, RANDOM]
   partitions=1/7594 files=38 size=3.45GB
   table stats: unavailable
   column stats: all
   hosts=13 per-host-mem=400.00MB
   tuple-ids=0 row-size=0B cardinality=unavailable
----------------
    Estimated Per-Host Mem: 429916160
    Estimated Per-Host VCores: 1
    Tables Missing Stats: adhoc_data_fast.log
    Request Pool: default-pool
    Admission result: Admitted immediately
    ExecSummary:
Operator       #Hosts   Avg Time   Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
-------------------------------------------------------------------------------------------------------------
03:AGGREGATE        1   52.298ms   52.298ms      0           1   4.00 KB        -1.00 B  FINALIZE
02:EXCHANGE         1  676.993ms  676.993ms      0           1         0        -1.00 B  UNPARTITIONED
01:AGGREGATE        0    0.000ns    0.000ns      0           1         0       10.00 MB
00:SCAN HDFS        0    0.000ns    0.000ns      0          -1         0      400.00 MB  adhoc_data_fast.log
    Planner Timeline: 69.589ms
       - Analysis finished: 6.642ms (6.642ms)
       - Equivalence classes computed: 6.980ms (337.753us)
       - Single node plan created: 13.302ms (6.322ms)
       - Runtime filters computed: 13.368ms (65.984us)
       - Distributed plan created: 15.131ms (1.763ms)
       - Lineage info computed: 16.488ms (1.356ms)
       - Planning finished: 69.589ms (53.101ms)
    Query Timeline: 874.026ms
       - Start execution: 63.320us (63.320us)
       - Planning finished: 72.764ms (72.701ms)
       - Submit for admission: 73.592ms (827.496us)
       - Completed admission: 73.775ms (183.088us)
       - Ready to start 13 remote fragments: 126.950ms (53.175ms)
       - All 13 remote fragments started: 161.919ms (34.968ms)
       - Rows available: 856.761ms (694.842ms)
       - Unregister query: 872.527ms (15.765ms)
     - ComputeScanRangeAssignmentTimer: 356.136us
  ImpalaServer:
     - ClientFetchWaitTimer: 0.000ns
     - RowMaterializationTimer: 0.000ns
  Execution Profile ef4698db870efd4d:739c89ef00000000:(Total: 782.712ms, non-child: 0.000ns, % non-child: 0.00%)
    Number of filters: 0
    Filter routing table:
 ID  Src. Node  Tgt. Node(s)  Targets  Target type  Partition filter  Pending (Expected)  First arrived  Completed   Enabled
----------------------------------------------------------------------------------------------------------------------------

    Fragment start latencies: Count: 13, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 2ms, 95th %-ile: 2ms, 99.9th %-ile: 35ms
    Per Node Peak Memory Usage: szq15.appadhoc.com:22000(0) szq1.appadhoc.com:22000(0) szq13.appadhoc.com:22000(0) szq12.appadhoc.com:22000(0) szq11.appadhoc.com:22000(0) szq20.appadhoc.com:22000(0) szq14.appadhoc.com:22000(0) szq8
.appadhoc.com:22000(0) szq5.appadhoc.com:22000(0) szq9.appadhoc.com:22000(0) szq4.appadhoc.com:22000(0) szq6.appadhoc.com:22000(0) szq7.appadhoc.com:22000(0)
     - FiltersReceived: 0 (0)
     - FinalizationTimer: 0.000ns
    Coordinator Fragment F01:(Total: 729.811ms, non-child: 0.000ns, % non-child: 0.00%)
      MemoryUsage(500.000ms): 12.00 KB
       - AverageThreadTokens: 0.00
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 12.00 KB (12288)
       - PerHostPeakMemUsage: 0
       - PrepareTime: 52.291ms
       - RowsProduced: 0 (0)
       - TotalCpuTime: 0.000ns
       - TotalNetworkReceiveTime: 676.991ms
       - TotalNetworkSendTime: 0.000ns
       - TotalStorageWaitTime: 0.000ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 0 (0)
         - BlocksRecycled: 0 (0)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 102.40 GB (109951164416)
         - PeakMemoryUsage: 0
         - TotalBufferWaitTime: 0.000ns
         - TotalEncryptionTime: 0.000ns
         - TotalIntegrityCheckTime: 0.000ns
         - TotalReadBlockTime: 0.000ns
      CodeGen:(Total: 63.837ms, non-child: 63.837ms, % non-child: 100.00%)
         - CodegenTime: 828.728us
         - CompileTime: 2.957ms
         - LoadTime: 0.000ns
         - ModuleBitcodeSize: 1.89 MB (1984232)
         - NumFunctions: 7 (7)
         - NumInstructions: 96 (96)
         - OptimizationTime: 8.070ms
         - PrepareTime: 51.769ms
      AGGREGATION_NODE (id=3):(Total: 729.291ms, non-child: 52.298ms, % non-child: 7.17%)
        ExecOption: Codegen Enabled
         - BuildTime: 0.000ns
         - GetResultsTime: 0.000ns
         - HTResizeTime: 0.000ns
         - HashBuckets: 0 (0)
         - LargestPartitionPercent: 0 (0)
         - MaxPartitionLevel: 0 (0)
         - NumRepartitions: 0 (0)
         - PartitionsCreated: 0 (0)
         - PeakMemoryUsage: 4.00 KB (4096)
         - RowsRepartitioned: 0 (0)
         - RowsReturned: 0 (0)
         - RowsReturnedRate: 0
         - SpilledPartitions: 0 (0)
      EXCHANGE_NODE (id=2):(Total: 676.993ms, non-child: 676.993ms, % non-child: 100.00%)
        BytesReceived(500.000ms): 0
         - BytesReceived: 0
         - ConvertRowBatchTime: 0.000ns
         - DeserializeRowBatchTimer: 0.000ns
         - FirstBatchArrivalWaitTime: 0.000ns
         - PeakMemoryUsage: 0
         - RowsReturned: 0 (0)
         - RowsReturnedRate: 0
         - SendersBlockedTimer: 0.000ns
         - SendersBlockedTotalTimer(*): 0.000ns
    Averaged Fragment F00:
      split sizes:  min: 114.60 MB, max: 451.79 MB, avg: 271.65 MB, stddev: 104.16 MB
      completion times: min:694.632ms  max:728.356ms  mean: 725.379ms  stddev:8.878ms
      execution rates: min:157.45 MB/sec  max:620.68 MB/sec  mean:374.89 MB/sec  stddev:144.30 MB/sec
      num instances: 13
    Fragment F00:
      Instance ef4698db870efd4d:739c89ef00000001 (host=szq5.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000002 (host=szq8.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000003 (host=szq14.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000004 (host=szq20.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000005 (host=szq11.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000006 (host=szq12.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000007 (host=szq13.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000008 (host=szq1.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000009 (host=szq15.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000a (host=szq6.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000b (host=szq4.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000c (host=szq9.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000d (host=szq7.appadhoc.com:22000):

So why Impala needs so many memory?

Upvotes: 0

Views: 4520

Answers (1)

Toby Scamell
Toby Scamell

Reputation: 151

It could be that Impala is missing statistics on your table for that partition. The explain plan highlights the following:

Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log

Try running a COMPUTE STATS on the table, or a COMPUTE INCREMENTAL STATS for the partition.

e.g.

COMPUTE INCREMENTAL STATS adhoc_data_fast.log PARTITION (day='2017-04-05');

This will help Impala when it does its resource planning. I would be surprised if this fixes it, but worth a shot initially.

Upvotes: 1

Related Questions