Reputation: 141
Let us say we have 2 hive tables, tableA & tableB. I am exploding tableA, JOINing it with few other tables, and then inserting into tableB.
Insert works fine when tableB has no partitions, or insertions are done using static partition.
However, when there is a dynamic partition, the map reduce jobs doesn't even start. It sort of hangs.
To debug more, I set the following param while initializing hive:
-hiveconf hive.root.logger=DEBUG,console
Now, I can see that the job is not actually hung. It is continuously printing logs like:
........
16/02/11 09:25:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:25:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2139 and EX_2140 as parent of FS_68 and child of EX_2138
16/02/11 09:25:55 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:25:55 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2141 and EX_2142 as parent of FS_68 and child of EX_2140
16/02/11 09:25:59 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:25:59 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2143 and EX_2144 as parent of FS_68 and child of EX_2142
16/02/11 09:26:03 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:03 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2145 and EX_2146 as parent of FS_68 and child of EX_2144
16/02/11 09:26:08 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:08 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2147 and EX_2148 as parent of FS_68 and child of EX_2146
16/02/11 09:26:12 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:12 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2149 and EX_2150 as parent of FS_68 and child of EX_2148
16/02/11 09:26:17 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:17 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2151 and EX_2152 as parent of FS_68 and child of EX_2150
16/02/11 09:26:19 [Thread-5]: INFO metrics.MetricsSaver: Saved 8:22 records to /mnt/var/em/raw/i-63eec5e6_20160211_RunJar_14276_raw.bin
16/02/11 09:26:21 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:21 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2153 and EX_2154 as parent of FS_68 and child of EX_2152
16/02/11 09:26:26 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:26 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2155 and EX_2156 as parent of FS_68 and child of EX_2154
16/02/11 09:26:30 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:30 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2157 and EX_2158 as parent of FS_68 and child of EX_2156
16/02/11 09:26:35 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:35 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2159 and EX_2160 as parent of FS_68 and child of EX_2158
16/02/11 09:26:40 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:40 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2161 and EX_2162 as parent of FS_68 and child of EX_2160
16/02/11 09:26:45 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:45 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2163 and EX_2164 as parent of FS_68 and child of EX_2162
16/02/11 09:26:49 [Thread-5]: INFO metrics.MetricsSaver: Saved 8:22 records to /mnt/var/em/raw/i-63eec5e6_20160211_RunJar_14276_raw.bin
16/02/11 09:26:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2165 and EX_2166 as parent of FS_68 and child of EX_2164
16/02/11 09:26:56 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
16/02/11 09:26:56 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2167 and EX_2168 as parent of FS_68 and child of EX_2166
..............
These logs are printed like forever! However, without the dynamic partition, the complete insert query completes successfully in about 10 mins.
Also, the number of distinct values for the dynamic partition in the whole table is only 3, so its not a case that I am using an unsuitable column as a dynamic partition.
Hence,
What does the logs being printed mean?
What is the optimization/remedy required for this situation?
Thanks a lot for any help in advance!
Upvotes: 1
Views: 1271
Reputation: 141
Setting the following parameter worked :
SET hive.optimize.sort.dynamic.partition=false
My hive version is 0.13.1. Quoting apache wiki for this param:
hive.optimize.sort.dynamic.partition
Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151) Added In: Hive 0.13.0 with HIVE-6455 When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.
Thanks.
Upvotes: 2