Reputation: 5397
MY Hive Query is throwing this exception.
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
2013-05-22 12:08:32,634 Stage-1 map = 0%, reduce = 0%
2013-05-22 12:09:19,984 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201305221200_0001 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201305221200_0001_m_000007 (and more) from job job_201305221200_0001
Examining task ID: task_201305221200_0001_m_000003 (and more) from job job_201305221200_0001
Examining task ID: task_201305221200_0001_m_000001 (and more) from job job_201305221200_0001
Task with the most failures(4):
-----
Task ID:
task_201305221200_0001_m_000001
URL:
http://ip-10-134-7-119.ap-southeast-1.compute.internal:9100/taskdetails.jsp?jobid=job_201305221200_0001&tipid=task_201305221200_0001_m_000001
Possible error:
Out of memory due to hash maps used in map-side aggregation.
Solution:
Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;'
-----
Counters:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
select
uri,
count(*) as hits
from
iislog
where
substr(cs_cookie,instr(cs_Cookie,'cwc'),30) like '%CWC%'
and uri like '%.aspx%'
and logdate = '2013-02-07'
group by uri
order by hits Desc;
I tried this on 8 EMR core instances with 1 large master instance on 8Gb of data. First i tried with external table (location of data is path of s3). After that i downloaded data from S3 to EMR and used native hive tables. But in both of them i got the same error.
FYI, i am using regex serde to parse the iislogs.
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" ="([0-9-]+) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([0-9-]+ [0-9:.]+) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([0-9-]+ [0-9:.]+)",
"output.format.string"="%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s %12$s %13$s %14$s %15$s %16$s %17$s %18$s %19$s %20$s %21$s %22$s %23$s %24$s %25$s %26$s %27$s %28$s %29$s %30$s %31$s %32$s")
location 's3://*******';
Upvotes: 1
Views: 2762
Reputation: 3824
Have you tried setting set hive.map.aggr.hash.percentmemory = 0.25;
as written in the message?
You can read more here
Upvotes: 1
Reputation: 1214
it would be better if you can paste the query - so one can figure out whether the mapper is sorting as well or not.
in any case - we need to increase the amount of memory. check how much memory the map tasks are configured to run with (mapred.child...). should be at least about 1G. if that is large enough you can:
Upvotes: 1