Reputation: 2747
Edit:
There are some corrupt AVRO files in the table. After remove some of them, every thing works fine. I've de-compress these files to json with avro-tools and the decompressed file is not very large, either. So it seems to be some bug in Impala to handle corrupt AVRO files.
I have an Impala table with gziped AVRO format, which is partitioned by "day". When I execute the query:
select count(0) from adhoc_data_fast.log where day='2017-04-05';
It said:
Query: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Query submitted at: 2017-04-06 13:35:56 (Coordinator: http://szq7.appadhoc.com:25000)
Query progress can be monitored at: http://szq7.appadhoc.com:25000/query_plan?query_id=ef4698db870efd4d:739c89ef00000000
WARNINGS:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.
Each node is configured with 96 GB memory and the single pool memory limit is set to 300 GB.
All the files after compressed is no larger than 250MB:
62M log.2017-04-05.1491321605834.avro
79M log.2017-04-05.1491323647211.avro
62M log.2017-04-05.1491327241311.avro
60M log.2017-04-05.1491330839609.avro
52M log.2017-04-05.1491334439092.avro
59M log.2017-04-05.1491338038503.avro
93M log.2017-04-05.1491341639694.avro
130M log.2017-04-05.1491345239969.avro
147M log.2017-04-05.1491348843931.avro
183M log.2017-04-05.1491352442955.avro
218M log.2017-04-05.1491359648079.avro
181M log.2017-04-05.1491363247597.avro
212M log.2017-04-05.1491366845827.avro
207M log.2017-04-05.1491370445873.avro
197M log.2017-04-05.1491374045830.avro
164M log.2017-04-05.1491377650935.avro
155M log.2017-04-05.1491381249597.avro
203M log.2017-04-05.1491384846366.avro
185M log.2017-04-05.1491388450262.avro
198M log.2017-04-05.1491392047694.avro
206M log.2017-04-05.1491395648818.avro
214M log.2017-04-05.1491399246407.avro
167M log.2017-04-05.1491402846469.avro
77M log.2017-04-05.1491406180615.avro
3.2M log.2017-04-05.1491409790105.avro
1.3M log.2017-04-05.1491413385884.avro
928K log.2017-04-05.1491416981829.avro
832K log.2017-04-05.1491420581588.avro
1.1M log.2017-04-05.1491424180191.avro
2.6M log.2017-04-05.1491427781339.avro
3.8M log.2017-04-05.1491431382552.avro
3.3M log.2017-04-05.1491434984679.avro
5.2M log.2017-04-05.1491438586674.avro
5.1M log.2017-04-05.1491442192541.avro
2.3M log.2017-04-05.1491445789230.avro
884K log.2017-04-05.1491449386630.avro
And I've get them from HDFS and use avro-tools to convert them to json in order to decompress them. The decompressed files are no larger than 1GB:
16M log.2017-04-05.1491321605834.avro.json
308M log.2017-04-05.1491323647211.avro.json
103M log.2017-04-05.1491327241311.avro.json
150M log.2017-04-05.1491330839609.avro.json
397M log.2017-04-05.1491334439092.avro.json
297M log.2017-04-05.1491338038503.avro.json
160M log.2017-04-05.1491341639694.avro.json
95M log.2017-04-05.1491345239969.avro.json
360M log.2017-04-05.1491348843931.avro.json
338M log.2017-04-05.1491352442955.avro.json
71M log.2017-04-05.1491359648079.avro.json
161M log.2017-04-05.1491363247597.avro.json
628M log.2017-04-05.1491366845827.avro.json
288M log.2017-04-05.1491370445873.avro.json
162M log.2017-04-05.1491374045830.avro.json
90M log.2017-04-05.1491377650935.avro.json
269M log.2017-04-05.1491381249597.avro.json
620M log.2017-04-05.1491384846366.avro.json
70M log.2017-04-05.1491388450262.avro.json
30M log.2017-04-05.1491392047694.avro.json
114M log.2017-04-05.1491395648818.avro.json
370M log.2017-04-05.1491399246407.avro.json
359M log.2017-04-05.1491402846469.avro.json
218M log.2017-04-05.1491406180615.avro.json
29M log.2017-04-05.1491409790105.avro.json
3.9M log.2017-04-05.1491413385884.avro.json
9.3M log.2017-04-05.1491416981829.avro.json
8.3M log.2017-04-05.1491420581588.avro.json
2.3M log.2017-04-05.1491424180191.avro.json
25M log.2017-04-05.1491427781339.avro.json
24M log.2017-04-05.1491431382552.avro.json
5.7M log.2017-04-05.1491434984679.avro.json
35M log.2017-04-05.1491438586674.avro.json
5.8M log.2017-04-05.1491442192541.avro.json
23M log.2017-04-05.1491445789230.avro.json
4.3M log.2017-04-05.1491449386630.avro.json
And here is the Impala profiling:
[szq7.appadhoc.com:21000] > profile;
Query Runtime Profile:
Query (id=ef4698db870efd4d:739c89ef00000000):
Summary:
Session ID: f54bb090170bcdb6:621ac5796ef2668c
Session Type: BEESWAX
Start Time: 2017-04-06 13:35:56.454441000
End Time: 2017-04-06 13:35:57.326967000
Query Type: QUERY
Query State: EXCEPTION
Query Status:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.
Impala Version: impalad version 2.7.0-cdh5.9.1 RELEASE (build 24ad6df788d66e4af9496edb26ac4d1f1d2a1f2c)
User: ubuntu
Connected User: ubuntu
Delegated User:
Network Address: ::ffff:192.168.1.7:29026
Default Db: default
Sql Statement: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Coordinator: szq7.appadhoc.com:22000
Query Options (non default):
Plan:
----------------
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log
03:AGGREGATE [FINALIZE]
| output: count:merge(0)
| hosts=13 per-host-mem=unavailable
| tuple-ids=1 row-size=8B cardinality=1
|
02:EXCHANGE [UNPARTITIONED]
| hosts=13 per-host-mem=unavailable
| tuple-ids=1 row-size=8B cardinality=1
|
01:AGGREGATE
| output: count(0)
| hosts=13 per-host-mem=10.00MB
| tuple-ids=1 row-size=8B cardinality=1
|
00:SCAN HDFS [adhoc_data_fast.log, RANDOM]
partitions=1/7594 files=38 size=3.45GB
table stats: unavailable
column stats: all
hosts=13 per-host-mem=400.00MB
tuple-ids=0 row-size=0B cardinality=unavailable
----------------
Estimated Per-Host Mem: 429916160
Estimated Per-Host VCores: 1
Tables Missing Stats: adhoc_data_fast.log
Request Pool: default-pool
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------
03:AGGREGATE 1 52.298ms 52.298ms 0 1 4.00 KB -1.00 B FINALIZE
02:EXCHANGE 1 676.993ms 676.993ms 0 1 0 -1.00 B UNPARTITIONED
01:AGGREGATE 0 0.000ns 0.000ns 0 1 0 10.00 MB
00:SCAN HDFS 0 0.000ns 0.000ns 0 -1 0 400.00 MB adhoc_data_fast.log
Planner Timeline: 69.589ms
- Analysis finished: 6.642ms (6.642ms)
- Equivalence classes computed: 6.980ms (337.753us)
- Single node plan created: 13.302ms (6.322ms)
- Runtime filters computed: 13.368ms (65.984us)
- Distributed plan created: 15.131ms (1.763ms)
- Lineage info computed: 16.488ms (1.356ms)
- Planning finished: 69.589ms (53.101ms)
Query Timeline: 874.026ms
- Start execution: 63.320us (63.320us)
- Planning finished: 72.764ms (72.701ms)
- Submit for admission: 73.592ms (827.496us)
- Completed admission: 73.775ms (183.088us)
- Ready to start 13 remote fragments: 126.950ms (53.175ms)
- All 13 remote fragments started: 161.919ms (34.968ms)
- Rows available: 856.761ms (694.842ms)
- Unregister query: 872.527ms (15.765ms)
- ComputeScanRangeAssignmentTimer: 356.136us
ImpalaServer:
- ClientFetchWaitTimer: 0.000ns
- RowMaterializationTimer: 0.000ns
Execution Profile ef4698db870efd4d:739c89ef00000000:(Total: 782.712ms, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------
Fragment start latencies: Count: 13, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 2ms, 95th %-ile: 2ms, 99.9th %-ile: 35ms
Per Node Peak Memory Usage: szq15.appadhoc.com:22000(0) szq1.appadhoc.com:22000(0) szq13.appadhoc.com:22000(0) szq12.appadhoc.com:22000(0) szq11.appadhoc.com:22000(0) szq20.appadhoc.com:22000(0) szq14.appadhoc.com:22000(0) szq8
.appadhoc.com:22000(0) szq5.appadhoc.com:22000(0) szq9.appadhoc.com:22000(0) szq4.appadhoc.com:22000(0) szq6.appadhoc.com:22000(0) szq7.appadhoc.com:22000(0)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F01:(Total: 729.811ms, non-child: 0.000ns, % non-child: 0.00%)
MemoryUsage(500.000ms): 12.00 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 12.00 KB (12288)
- PerHostPeakMemUsage: 0
- PrepareTime: 52.291ms
- RowsProduced: 0 (0)
- TotalCpuTime: 0.000ns
- TotalNetworkReceiveTime: 676.991ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 102.40 GB (109951164416)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
CodeGen:(Total: 63.837ms, non-child: 63.837ms, % non-child: 100.00%)
- CodegenTime: 828.728us
- CompileTime: 2.957ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.89 MB (1984232)
- NumFunctions: 7 (7)
- NumInstructions: 96 (96)
- OptimizationTime: 8.070ms
- PrepareTime: 51.769ms
AGGREGATION_NODE (id=3):(Total: 729.291ms, non-child: 52.298ms, % non-child: 7.17%)
ExecOption: Codegen Enabled
- BuildTime: 0.000ns
- GetResultsTime: 0.000ns
- HTResizeTime: 0.000ns
- HashBuckets: 0 (0)
- LargestPartitionPercent: 0 (0)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 0 (0)
- PeakMemoryUsage: 4.00 KB (4096)
- RowsRepartitioned: 0 (0)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SpilledPartitions: 0 (0)
EXCHANGE_NODE (id=2):(Total: 676.993ms, non-child: 676.993ms, % non-child: 100.00%)
BytesReceived(500.000ms): 0
- BytesReceived: 0
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:
split sizes: min: 114.60 MB, max: 451.79 MB, avg: 271.65 MB, stddev: 104.16 MB
completion times: min:694.632ms max:728.356ms mean: 725.379ms stddev:8.878ms
execution rates: min:157.45 MB/sec max:620.68 MB/sec mean:374.89 MB/sec stddev:144.30 MB/sec
num instances: 13
Fragment F00:
Instance ef4698db870efd4d:739c89ef00000001 (host=szq5.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000002 (host=szq8.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000003 (host=szq14.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000004 (host=szq20.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000005 (host=szq11.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000006 (host=szq12.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000007 (host=szq13.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000008 (host=szq1.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000009 (host=szq15.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000a (host=szq6.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000b (host=szq4.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000c (host=szq9.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000d (host=szq7.appadhoc.com:22000):
So why Impala needs so many memory?
Upvotes: 0
Views: 4520
Reputation: 151
It could be that Impala is missing statistics on your table for that partition. The explain plan highlights the following:
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log
Try running a COMPUTE STATS on the table, or a COMPUTE INCREMENTAL STATS for the partition.
e.g.
COMPUTE INCREMENTAL STATS adhoc_data_fast.log PARTITION (day='2017-04-05');
This will help Impala when it does its resource planning. I would be surprised if this fixes it, but worth a shot initially.
Upvotes: 1