Ameya
Ameya

Reputation: 83

PIG Script Error: java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme

I am running a PIG script in mapreduce mode. The script reads RCFile (containing Thrift serialized data stored in GZIP compressed format), deserializes it using a UDF, extracts certain fields from the Thrift struct, and stores them.

Some of the mappers fail with following error:

2015-12-23 03:07:45,638 FATAL [Thread-5] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class;
at com.xxx.yyy.thrift.dto.LatLong.read(LatLong.java:553)
at com.twitter.elephantbird.util.ThriftUtils.readSingleFieldNoTag(ThriftUtils.java:318)
at com.twitter.elephantbird.util.ThriftUtils.readFieldNoTag(ThriftUtils.java:352)
at com.twitter.elephantbird.mapreduce.input.RCFileThriftTupleInputFormat$TupleReader.getCurrentTupleValue(RCFileThriftTupleInputFormat.java:74)
at com.twitter.elephantbird.pig.load.RCFileThriftPigLoader.getNext(RCFileThriftPigLoader.java:46)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Here's my script:

REGISTER '/user/ameya/libs/geo-analysis-1.0.0-SNAPSHOT.jar';
REGISTER '/user/ameya/libs/libthrift-0.8.0.jar';
REGISTER '/user/ameya/libs/thrift-0.8-types-1.1.29-SNAPSHOT.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-pig-4.7.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-rcfile-4.7.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-core-4.7.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-hadoop-compat-4.7.jar';
REGISTER '/user/ameya/libs/libs/hive-0.4.1.jar';
REGISTER '/user/ameya/libs/libs/libs/hive-serde-0.13.3.jar';

SET output.compression.enabled true;
SET output.compression.codec org.apache.hadoop.io.compress.GzipCodec;

thrift = LOAD '$input' USING com.twitter.elephantbird.pig.load.RCFileThriftPigLoader('com.xxx.yyy.thrift.dto.LatLong');

final = FOREACH thrift GENERATE (requestLatLong is not null ? requestLatLong.latitude : null) AS req_ll_lat,
            (requestLatLong is not null ? requestLatLong.longitude : null) AS req_ll_lng;

STORE final INTO '$output';

I am using libthrift-0.8.0.jar, where class TProtocol.java has indeed defined getScheme() method (with public access). Interestingly, not all the mappers fail, just a few of them; but that causes my job to fail. Could this be a CLASSPATH issue?

I tried searching for this issue, but could not find relevant answers. Can someone please help me get some leads to fix this?

Upvotes: 0

Views: 3660

Answers (1)

Ameya
Ameya

Reputation: 83

Found the reason. The class "org.apache.thrift.protocol.TProtocol" was defined in two jars, i.e. libthrift-0.8.0.jar and hive-0.4.1.jar. The one in hive-0.4.1.jar did not have method getScheme() defined. When it picked up hive-0.4.1.jar in the classpath first, the mappers were not able to find method getScheme().

I am not sure why the behavior was not consistent across all mappers. Any comments to explain that would be helpful.

I replaced hive-0.4.1.jar with hive-exec-0.13.3.jar and the issue got resolved.

Upvotes: 2

Related Questions